question

SatyaD-1257 avatar image
1 Vote"
SatyaD-1257 asked PRADEEPCHEEKATLA-MSFT commented

Azure Synapse Workspace - Reading a file as File Stream

Following the link - azure-synapse-workspace-how-to-read-an-excel-file.html, I am trying to read a file in Azure Synapse (Pyspark) notebook. I keep getting the File Not found error.
FileNotFoundError : [Errno 2] No such file or directory

I am using a Python file read and want to read the file as a File Stream.

f_read = open(mountfilename, "r")
records = Medline.parse(f_read)

The Medline is a custom library and expects the file handle to be passed. I am using the SAS Token to get the file over https. I am able to download the file successfully in a browser which confirms the file exists and the token works but it wouldn't be the case with the Pyspark notebook. Am I missing any configuration here?

Edit: I was able to achieve this on Databricks with the file mapped to DBFS and reading the file like above snippet. Trying to see if I can run the Databricks notebooks on Synapse with minimal changes.

azure-synapse-analytics
· 3
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hello @SatyaD-1257,

In order to repro what you are trying to achieve, Could you please provide as much details as possible about your scenario?

  • I would request you to provide the steps which you are trying, along with the screenshot of the error message?

  • When you say "I was able to achieve this on Databricks with the file mapped to DBFS and reading the file like the above snippet", could you please share the steps along with how you have installed Medline custom library on Azure Databricks?

0 Votes 0 ·
SatyaD-1257 avatar image SatyaD-1257 PRADEEPCHEEKATLA-MSFT ·

Hi @PRADEEPCHEEKATLA-MSFT, The issue is not with the Medline library and Medline code can be commented, I left it there to indicate why I would need a file stream.

  1. I have a Azure Data Lake (Gen2) with a container called - 'raw' and hosts a text file.

  2. I have my Synapse service attached to the ADL

  3. I have a Pyspark notebook that is getting the file from the ADL Gen2, the goal is to read it with Python Open() to get a file stream that can be used in the next steps.

  4. When I read the file using the SAS Token over https I get the 'File Not Found' Exception. (Screen shot1 )

  5. Before trying the code, I tried to check the file existence using the same https link with SAS token in a browser and was able to download the file successfully. Let me know if you need any additional details.


50892-screenshot1.gif


1 Vote 1 ·
screenshot1.gif (20.1 KiB)

Hi @SatyaD-1257 , I am facing the same issue. I wanted to open an adls gen2 file as a file stream for next processing steps in the notebook. I noticed you mentioned that you set up a blob client to get the stream. Could you please share the steps you followed? (preferably with screenshots.).

Thank you so much!

0 Votes 0 ·
PRADEEPCHEEKATLA-MSFT avatar image
0 Votes"
PRADEEPCHEEKATLA-MSFT answered DheerajAwale-4542 commented

Hello @SatyaD-1257,

As per my repro, reading text files from ADLS gen2 cannot be accessed directly using the Python built-in function. When I tried reading the text file via ADLS gen2 URL, I got the same error message as you:

 FileNotFoundError: [Errno 2] No such file or directory: 'https://chepragen2.blob.core.windows.net/filesystem/flightdata/1.txt?sv=2019-12-12&ss=bfqt&srt=sco&sp=rwdlacupx&se=2020-12-24T13:28:18Z&st=2020-12-24T05:28:18Z&spr=https&sig=XXXXXXXXXXXXXXXXXXXXXXXXXX'
 Traceback (most recent call last):

51071-image.png

To read a file using the ADLS Gen2 SAS token, I would request you to use pandas as shown below:

 import pandas as pd
    
 data = pd.read_csv("https://chepragen2.blob.core.windows.net/filesystem/flightdata/1.txt?sv=2019-12-12&ss=bfqt&srt=sco&sp=rwdlacupx&se=2020-12-24T13:28:18Z&st=2020-12-24T05:28:18Z&spr=https&sig=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX")
 print(data)

51081-image.png

Hope this helps. Do let us know if you any further queries.


  • Please accept an answer if correct. Original posters help the community find answers faster by identifying the correct answer. Here is how.

  • Want a reminder to come back and check responses? Here is how to subscribe to a notification.


image.png (47.2 KiB)
image.png (43.1 KiB)
· 3
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hi @PRADEEPCHEEKATLA-MSFT ,

As the requirement is getting a file stream not the data frame, I didn't use the suggested solution. Instead, I ended up creating Blob Client to get the stream and use it for the down steam function.
Thanks for your inputs!

1 Vote 1 ·

Hello @SatyaD-1257,

Glad to know that your issue has resolved. And thanks for sharing the solution, which might be beneficial to other community members reading this thread.


Do click on "Accept Answer" and Upvote on the post that helps you, this can be beneficial to other community members.

0 Votes 0 ·

@SatyaD-1257 , would you please share the solution of how you created blob client in pyspark? I am also facing same issue

0 Votes 0 ·
PrateekNarula-5198 avatar image
0 Votes"
PrateekNarula-5198 answered PRADEEPCHEEKATLA-MSFT commented

hello @PRADEEPCHEEKATLA-MSFT
How can I read a .dcm file in the same construct?
It is an image format for medical images. I've stored the file on ADLS gen2 and want to access those files in a notebook,207191-screenshot-2022-05-31-at-181407.png

and this is how I'm trying to access it, I'm using pydicom and notebook is unable to resolve the file destination for dcm file. Although csv file is fine.
207168-screenshot-2022-05-31-at-181552.png

data['dicom'] is the path to the file and it is correct. example path is
abfss:/[adls path].dfs.core.windows.net/<container name>/stage_2_train_images/0004cfab-14fd-4e49-80ba-63a80b6bddd6.dcm


· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hello @PrateekNarula-5198,

Since this thread is too old, I would recommend creating a new thread on the same forum with as much details about your issue as possible. That would make sure that your issue has better visibility in the community.

1 Vote 1 ·