question

ALBERTOJUNIOR-9931 avatar image
0 Votes"
ALBERTOJUNIOR-9931 asked PRADEEPCHEEKATLA-MSFT commented

Pyspark HDInsight DataFactory Eviroment Variable

I'm faced a problem with Pyspark, datafactory and HdInsight

I create a HDInsight with 2 master and 2 slaves.

I created environment variables in all server like

sudo echo 'TEST=server' >> /etc/environment

After that, in all server I opened sever and executed in terminal

  • pyspark

  • from os import environ as env

  • test = env.get("test")

  • print(test)
    The code will print - test

But when I use datafactory and execute spark-submit I can access the value of my variable





azure-hdinsightdotnet-ml-big-data
· 4
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hello @ALBERTOJUNIOR-9931,

Welcome to the Microsoft Q&A platform.

When you say "But when I use datafactory and execute spark-submit I can access the value of my variable", could you please provide more details on how you are using in ADF and execute the value of variable declared in the HDInsight ?

0 Votes 0 ·

Hello @ALBERTOJUNIOR-9931,

Just checking in if you have had a chance to see the previous response. We need the following information to understand/investigate this issue further.

0 Votes 0 ·

Hey Man, thanks for your question

First I created a code using pyspark, but the first time I try to get my credentials that are in my environment variables



 from os import environ as env
 from azure.identity import ClientSecretCredential
 from azure.keyvault.secrets import SecretClient
 AZURE_KEYVAULT_NAME = env.get("AZURE_KEYVAULT_NAME")
    
 print(AZURE_KEYVAULT_NAME)


I open a terminal and execute pyspark in all code node in my client I can see the print result

Aftert that, I opened on data factory and put this code but, and execute it, my code was sent by yarn(HDinsight) after that I received the error

File "hive_to_blobstorage_answers.py", line 6, in <module>
import configDF as config
File "/mnt/resource/hadoop/yarn/local/usercache/livy/appcache/application_1631130560230_0021/container_1631130560230_0021_05_000001/pyfiles/configDF.py", line 14, in <module>
KEYVAULT_URI = 'https://'+AZURE_KEYVAULT_NAME+'.vault.azure.net/'
TypeError: cannot concatenate 'str' and 'NoneType' objects


I checked and all my environment variables were set because I put them in /etc/environment but using Yarn cluster I can see it.

In nutshell, I need to know what document I need to insert my variables environment in order to yarn sees them







0 Votes 0 ·

Hello @ALBERTOJUNIOR-9931,

The KEYVAULT_URI = 'https://'+AZURE_KEYVAULT_NAME+'.vault.azure.net/' it seems error is related to string concatenation. Looks like "AZURE_KEYVAULT_NAME" has type as NoneType.

Converting "AZURE_KEYVAULT_NAME" this variable to string first and then concatenate should help.

0 Votes 0 ·

0 Answers