Is there any limitation of file size for reading a excel file through dataflow ADF

Alivia Banerjee 20 Reputation points
2024-04-16T14:53:03.7166667+00:00

I am trying to read an excel file from ADLS location with data flow. My data flow have some fancy transformation to change the header format. When I am using a small test excel file of 100 Kb I am able to load data properly to sync. But when my file size is 8MB I am not able to read the source in dataflow. It is giving the below error.

User's image

Error:

at Source 'ListenSource': empty String

What is the issue behind this?

Excel
Excel
A family of Microsoft spreadsheet software with tools for analyzing, charting, and communicating data.
1,467 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
9,570 questions
{count} votes

Accepted answer
  1. phemanth 5,825 Reputation points Microsoft Vendor
    2024-04-18T06:20:33.4733333+00:00

    @Alivia Banerjee Welcome to Microsoft Q&A platform and thanks for posting your question.

    I'm glad that you were able to resolve your issue and thank you for posting your solution so that others experiencing the same thing can easily reference this! Since the Microsoft Q&A community has a policy that "The question author cannot accept their own answer. They can only accept answers by others "I'll repost your solution in case you'd like to accept the answer.

    Ask: I am trying to read an excel file from ADLS location with data flow. My data flow have some fancy transformation to change the header format. When I am using a small test excel file of 100 Kb I am able to load data properly to sync. But when my file size is 8MB I am not able to read the source in dataflow. It is giving the below error.

    User's image

    Error:

    at Source 'ListenSource': empty String

    What is the issue behind this?

    Solution: The issue is with empty strings as the file has 2 header row the first one having null for many columns.

    User's image

    Though with small size of the same structure Dataflow was able to read and process properly but when the size increases it starts giving this error. As a workaround I have saved the excel into csv using a copy activity in prior stage. I used the range option as A2 to ignore the first header for the main data part. I used another copy activity to get the first header from the column where we do not have any further null values as for these few columns I needed the first header as actual header.

    User's image

    and now the dataflow combines the two csv and produces the data in required format.It is not having any issue reading the same data.

    If I missed anything please let me know and I'd be happy to add it to my answer, or feel free to comment below with any additional information.

    If you have any other questions, please let me know. Thank you again for your time and patience throughout this issue.


    Please don’t forget to Accept Answer and Yes for "was this answer helpful" wherever the information provided helps you, this can be beneficial to other community members.

    0 comments No comments

2 additional answers

Sort by: Most helpful
  1. Amira Bedhiafi 15,446 Reputation points
    2024-04-17T09:33:58.09+00:00

    Today, ADF does not support streaming reads for Excel in its Copy Activity, necessitating the loading of the entire Excel file into memory for cell-by-cell processing. This approach can lead to out-of-memory errors or performance issues when dealing with large files. Unfortunately, this limitation persists in handling Excel files within Copy Activity.

    However, I've explored some potential workarounds that you might consider implementing:

    1. Utilize a Dataflow activity to transfer a large Excel file to another data store. In Dataflow, Excel supports streaming reads, which are less demanding on CPU and memory resources. This method is suitable for handling files in the gigabyte range.
    2. Divide a large Excel file into multiple smaller ones—for example, split a 100 MB file into ten 10 MB files—and then use the Copy Activity to transfer these smaller files.
    3. Manually convert or save the large Excel file in CSV format, then use the Copy Activity to transfer the CSV file.
    4. Set up and maintain a Self-Hosted Integration Runtime (SHIR) on a high-capacity machine, then use the Copy Activity to transfer the large Excel file to another data store using this more robust integration runtime.

    Old thread : https://learn.microsoft.com/en-us/answers/questions/669288/adf-copy-operation-for-excel-files-is-not-working

    0 comments No comments

  2. Alivia Banerjee 20 Reputation points
    2024-04-18T05:34:03.69+00:00

    The issue is with empty strings as the file has 2 header row the first one having null for many columns.

    User's image

    Though with small size of the same structure Dataflow was able to read and process properly but when the size increases it starts giving this error. As a workaround I have saved the excel into csv using a copy activity in prior stage. I used the range option as A2 to ignore the first header for the main data part. I used another copy activity to get the first header from the column where we do not have any further null values as for these few columns I needed the first header as actual header.

    User's image

    and now the dataflow combines the two csv and produces the data in required format.It is not having any issue reading the same data.