Are there any way to partition data by filesize?

JunJian Xia 21 Reputation points
2021-03-04T09:34:07.803+00:00

74291-image.png

There is only partition data by numbers.

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
9,624 questions
0 comments No comments
{count} votes

Accepted answer
  1. KranthiPakala-MSFT 46,422 Reputation points Microsoft Employee
    2021-03-04T21:08:22.03+00:00

    Hi @JunJian Xia ,

    Thanks for your query. Unfortunately we do not have an out of box feature to partition the data by file size. I would recommend you to please log a feature request in ADF feedback forum : https://feedback.azure.com/forums/270578-azure-data-factory All the feedback shared in this forum are actively monitored and reviewed by ADF engineering team. Please do share the feedback thread once you have posted, so that other users with similar idea can upvote and comment on your suggestion.

    If your source is a file then you can try parameterizing the number of partitions you want to change based on the incoming source file size. In order to do that you can use Get Metadata activity to get the source file size and pass it as a parameter to Dataflow to calculate the partition number. Below is an example.

    74532-image.png

    74562-image.png

    Incase if it is database table you can use lookup to do partition based on row count as explained in this stack overflow thread: Azure Data Factory split file by file size

    Another option is you can create a Custom activity with your own data movement or transformation logic and use the activity in a pipeline.

    Hope this info helps.

    ----------

    Thank you
    Please do consider to click on "Accept Answer" and "Upvote" on the post that helps you, as it can be beneficial to other community members.


0 additional answers

Sort by: Most helpful