question

alexandertikhomirov-7179 avatar image
0 Votes"
alexandertikhomirov-7179 asked ShaikMaheer-MSFT answered

Accessing Azure ADLS gen2 with Pyspark on Windows

Hello
Maybe it not corrected tag about azure synapse, but my team is checking to have local dev experience to write code for Synapse Job definition. What the best practise in this case?
I have working local dev environment VSCode, PySpark and I could execute PySpark code and theoretically use the same scripts in Synapse Job definition, but I would like to use command

spark.read.load('abfss://XXX@XXX.dfs.core.windows.net/XXX/file.csv', format='csv')

to read file from adls gen2 from both environment, locally and in remote spark cluster, is it possible? Someone has such experience?

locally doesn't work

azure-synapse-analytics
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

1 Answer

ShaikMaheer-MSFT avatar image
0 Votes"
ShaikMaheer-MSFT answered

Hi @alexandertikhomirov-7179 ,

Thank you for posting query in Microsoft Q&A Platform.

To access ADLS gen2 from windows machine. You need to perform below high level steps.

  • Set up the environment

  • Configure your storage account in Hadoop

  • Connect to your storage account.

All above steps are detailed documented in below link. Kindly follow the steps and see if that helps. Please let us know how it goes.
https://docs.microsoft.com/en-us/dotnet/spark/how-to-guides/connect-to-azure-storage

· 2
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

this manual is for .NET Spark.
are you sure that the same manual will help when I am trying to read using PySpark? because when I tried after all this preparation, it did not help me.

Also I am a bit confused about this guideline, because it assumed that I have already installed .NET Spark on my PC using another guideline https://docs.microsoft.com/en-us/dotnet/spark/tutorials/get-started?tabs=windows
where proposed to installed completely different binaries (spark, Hadoop, winutils), for example, spark with Hadoop, but in another guide it said to install Hadoop w/o Hadoop. So, not very straightforward.

0 Votes 0 ·
ShaikMaheer-MSFT avatar image ShaikMaheer-MSFT alexandertikhomirov-7179 ·

Hi @alexandertikhomirov-7179 - Sorry for delay in response. Did you get chance to see below link. It seems the path format which you are using is correct. I doubt may be other set of code where may causing issue. Kindly compare your code with code in below link and see that gives any leads.
https://stackoverflow.com/questions/68817740/reading-azure-datalake-gen2-file-from-pyspark-in-local

If you already got any solution to it, then please feel free to share same. That way it will be helpful to community as well.

0 Votes 0 ·