question

balaa-uk avatar image
0 Votes"
balaa-uk asked KranthiPakala-MSFT commented

How do I split a large file into multiple files with specific number of rows in each file?

I have a large source file that I want to split, with each file having 10K rows. Data flow allows me to split into set number of partitions. But there is a problem. I don't want to split source file if it has only 10K or less rows. Anything above 10K should be split into multiple 10K chunks.

As an example

12K rows => produces 2 files - 1 with 10K another with 2K
20K rows => produces 2 files - each with 10K
9K rows => produces 1 file
20.1K rows => produces 3 files - two 10k files and 1 with remaining rows and so on

azure-data-factory
· 2
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hi @balaa-uk,

Checking to see if the below suggestion from @MarkKromer-8019 was helpful. If it answers your query, please do click “Accept Answer” and/or Up-Vote, as it might be beneficial to other community members reading this thread. And, if you have any further query do let us know.

Thanks

0 Votes 0 ·

Hi @balaa-uk,

We still have not heard back from you. Just wanted to check if you still need assistance on this query? if the below suggestion helps. please do consider to click on "Accept Answer" and "Up-vote" on the post that helps you, as it can be beneficial to other community members

Thank you

0 Votes 0 ·

1 Answer

MarkKromer-MSFT avatar image
0 Votes"
MarkKromer-MSFT answered

Use the techniques in this blog post below to create your formula for dynamically sizing the size of partition:

https://kromerbigdata.com/2021/03/04/dynamic-data-flow-partitions-in-adf-synapse/

In my example, I used a hardcoded value for the target file size. But you can use case() or iif() to apply your rule as described above in the size expression.

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.