question

ImranMondal-3977 avatar image
0 Votes"
ImranMondal-3977 asked MartinJaffer-MSFT answered

DataFactory Performance while loading Blob To Table Storage

Hi Team,

I am trying to Load CSV file to table storage which is taking more than 30 minutes to load 300MB file to table, Please suggest how can I improve the performance of this copy .

85543-capture-1.png85477-capture-2.png85498-capture-3.png


azure-data-factory
capture-1.png (69.5 KiB)
capture-2.png (39.2 KiB)
capture-3.png (38.2 KiB)
· 4
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

after changing few settings also, it is taking a huge time to load. Please help me resolve this.

85831-capture3.png85797-capture4.png


0 Votes 0 ·
capture3.png (82.7 KiB)
capture4.png (42.4 KiB)

Hello @ImranMondal-3977 I am bringing in others to help on your case.
In the short term you can try increasing DIU to 20.
The long term may involve an analysis of the table partition and data skew.

0 Votes 0 ·

Thank you @MartinJaffer-MSFT , I have tried increasing DIU to 32 also, but still the same. I have attached the sample file for your reference. we are getting more than millions of rows in a file..
86078-image.png


0 Votes 0 ·
image.png (121.2 KiB)
Show more comments

1 Answer

MartinJaffer-MSFT avatar image
1 Vote"
MartinJaffer-MSFT answered

@ImranMondal-3977
I suspect the slowdown is caused by the "unique rowkey" option. My hypothesis, is that since ADF uses the "Insert or Merge" and "Insert or Replace" operations, instead of the "Insert" operation, the task of finding a new unique rowkey value is harder. It would have to guess a value, then query the table to find out if it is already in use or not. If rowkey is already in use, guess again, if not in use, insert. The more rows in the table, the more likely it will have to guess again.

If this is the case, then adding a unique value to your data, and specifying that as rowkey would speed things up. The table storage locates a row as a combination of partitionkey and rowkey.

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.