How to check in ADF if a csv file contains data or not when file can contain more than 5000 rows (limitation for lookup activity)

Gużewski, Jacek 75 Reputation points
2024-04-30T07:41:26.0433333+00:00

Hi,

How can I check if a csv file contains any data (file always contains header row)?
Problem is that I can get file where is more than 5000 rows (sometimes I can get 60k).

In that case I face the issue of lookup limitation:
ErrorCode=UserErrorComputePluginFailure,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=The size of lookup activity result exceeds the limitation 4194304.,Source=,''Type=Microsoft.DataTransfer.Execution.Core.ExecutionException,Message=The size of lookup activity result exceeds the limitation 4194304.,Source=Microsoft.DataTransfer.TransferTask,'

What is the other way to check If I have or not any row with data in the file?

Azure Blob Storage
Azure Blob Storage
An Azure service that stores unstructured data in the cloud as blobs.
2,468 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
9,684 questions
{count} votes

2 answers

Sort by: Most helpful
  1. AnnuKumari-MSFT 31,561 Reputation points Microsoft Employee
    2024-04-30T14:59:05.0033333+00:00

    Hi @Gużewski, Jacek ,

    As per my understanding you are trying to know if csv file contains any data or not, or if it contains more than 5k rows, so basically you want to know the count of the records present in the csv file.

    If it would have been a SQL table, it would be an easier task just to use a select count(*) from table query . But in your case, since it's csv file, there could be two options you can try:

    • You can use 'copy activity' to move the data from csv file to SQL table which would lead to provide the rowcount as the copy activity output which can be further used in the pipeline to check if the count is greater than 5k rows then do this or else do that :
        @activity('Copy data1').output.rowsRead
        
      
    • Second approach would be to use dataflow to get the count of rows [Source (csvfile) -> DerivedColumn (rowcount) -> Aggregate (count(rowcount)) -> sink (cache the output of aggregate)]
      Watchout this video for reference : https://www.youtube.com/watch?v=ZUOowfJrrf4
      In case there is not id/Sno column, then we can use surrogate key to create id column first

    Hope it helps. Kindly accept the answer by clicking on Accept answer button. Thankyou


  2. AnnuKumari-MSFT 31,561 Reputation points Microsoft Employee
    2024-05-15T15:19:36.27+00:00

    @Gużewski, Jacek ,

    I'm glad that you were able to resolve your issue and thank you for posting your solution so that others experiencing the same thing can easily reference this! Since the Microsoft Q&A community has a policy that "The question author cannot accept their own answer. They can only accept answers by others ", I'll repost your solution in case you'd like to accept the answer .

    Checking the first row is empty or not would help in determining if the file contains data or is empty with the help of following expression: @empty(activity('Lookup1').output.firstrow)

    Kindly accept the answer by clicking on Accept answer button. Thankyou

    0 comments No comments