question

crisansou avatar image
1 Vote"
crisansou asked PRADEEPCHEEKATLA-MSFT commented

Azure Synapse Workspace - How to read an Excel file from Data Lake Gen2 using Pandas or PySpark?

Hi,

In Azure Synapse Workspace is it possible to read an Excel file from Data Lake Gen2 using Pandas/PySpark? If so, can you show an example, please?

Example:

import pandas as pd

file_path = '/dbfs/mnt/raw/2020/06/01/file.xlsx' or 'abfss://raw@dlsname.dfs.core.windows.net/2020/06/01/file.xlsx'
df = pd.read_excel(file_path)

Best regards,
Cristina

azure-synapse-analytics
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

PRADEEPCHEEKATLA-MSFT avatar image
3 Votes"
PRADEEPCHEEKATLA-MSFT answered someara50 commented

Hello @crisansou,

Welcome to Microsoft Q&A platform.

The method pandas.read_excel does not support using wasbs or abfss scheme URL to access the file. For more details, please refer pandas.read_excel.
So if you want to access the file with pandas, I suggest you create a sas token and use https scheme with sas token to access the file or download the file as stream then read it with pandas.

Steps to read excel file from Azure Synapse notebooks:

Step1: Create SAS token via Azure portal.

Select your Azure Storage account => Under settings => Click on Shared access signature

24374-image.png


Step2: Read excel file from Azure Data Lake Storage gen2.

 ReadExcel=pd.read_excel('https://<account name>.dfs.core.windows.net/<file system>/<path>?<sas token>')
 print(ReadExcel)

24317-image.png

Hope this helps. Do let us know if you any further queries.


Do click on "Accept Answer" and Upvote on the post that helps you, this can be beneficial to other community members.



image.png (117.8 KiB)
image.png (90.4 KiB)
· 10
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hello @crisansou,

Just checking in to see if the above answer helped. If this answers your query, do click “Accept Answer” and Up-Vote for the same. And, if you have any further query do let us know.

1 Vote 1 ·

Hi @PRADEEPCHEEKATLA-MSFT ,

Thank you so much, it worked!

Best regards,
Cristina

0 Votes 0 ·

Hello @crisansou,

Glad to know that your issue has resolved.

1 Vote 1 ·
Show more comments

I seem to be failing here: following the steps above I'm getting:


URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:852)>
Traceback (most recent call last):

when I do the read,

Any guidance ?

1 Vote 1 ·

I HAVE THE SAME ISSUE!

0 Votes 0 ·

Were you ever able to get a solution to the SSL: CERTIFICATE_VERIFY_FAILED error? I'm seeing the same thing in my workspace.

0 Votes 0 ·

Hi Pradeep,
Your answer helped a lot but I am facing another issue(screenshot attached) with the above solution:

  ImportError : Missing optional dependency 'xlrd'. Install xlrd >= 1.0.0 for Excel support Use pip or conda to install xlrd.
  Traceback (most recent call last):
    File "/home/trusted-service-user/cluster-env/env/lib/python3.6/site-packages/pandas/util/_decorators.py", line 208, in wrapper
      return func(*args, **kwargs)
    File "/home/trusted-service-user/cluster-env/env/lib/python3.6/site-packages/pandas/io/excel/_base.py", line 310, in read_excel
      io = ExcelFile(io, engine=engine)
    File "/home/trusted-service-user/cluster-env/env/lib/python3.6/site-packages/pandas/io/excel/_base.py", line 819, in __init__
      self._reader = self._engines[engine](self._io)

59941-error.png


0 Votes 0 ·
error.png (59.5 KiB)

Hi Waheed - I had the same error and managed to correct by creating an environments file and uploading the Spark Pool resource in Azure:

63007-image.png

This is a simple .txt containing

xlrd==1.1.0

See Microsoft docs:
https://docs.microsoft.com/en-us/azure/synapse-analytics/spark/apache-spark-azure-portal-add-libraries

Which installs this module onto the Spark Pool so it can be used in your scripts.


0 Votes 0 ·
image.png (81.3 KiB)
BanothHussain-6978 avatar image
0 Votes"
BanothHussain-6978 answered PRADEEPCHEEKATLA-MSFT commented

would this work for Azure Databricks notebooks as well?

I tried the same approach but when I try to write an excel file it basically says "No engine for filetype: 'xlsx?sv=xxxxxxxxxxxx'

· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hello @BanothHussain-6978,

Since this thread is too old, I would recommend creating a new thread on the same forum with as much details about your issue as possible. That would make sure that your issue has better visibility in the community.

0 Votes 0 ·