question

PaulHernandez-8067 avatar image
0 Votes"
PaulHernandez-8067 asked PaulHernandez-8067 answered

DWU Consumption sink with staging vs. copy data with polybase vs. copy data with copy command

Dear experts,

I am analysing the different options to load data into azure synapses dedicated sql pool and found relevant differences in the DWU consumption.

I tried 3 different methods:

  1. Transforming the data in a data flow and then insert it with a sink with staging enable (is this using polybase in the background?)

  2. Transforming the data in a data flow, copy it to an ADLS folder in parquet format, then moving it to the database using a copy activity with Polybase.

  3. Transforming the data in a data flow, copy it to an ADLS folder in parquet format, then moving it to the database using a copy activity with the copy command.

After many tests I realized the DWU usage for the first method, the insert using a sink in a data flow, consumes around 4 times more DWUs:

91692-image.png


Could someone explain this behavior?

I repeated the tests several time, so this is not occasionally happens.

azure-synapse-analytics
image.png (30.6 KiB)
· 2
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hello @PaulHernandez-8067 ,
Thanks for the ask and using the Microsoft Q&A platform .
At this time, we are reaching out to the internal team to get some help on this . We will update you once we hear back from them.
Thanks
Himanshu

0 Votes 0 ·

Hello @PaulHernandez-8067,
We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet .In case if you have any resolution please do share that same with the community as it can be helpful to others . Otherwise, will respond back with the more details and we will try to help .
Thanks
Himanshu

0 Votes 0 ·
PaulHernandez-8067 avatar image
0 Votes"
PaulHernandez-8067 answered

Hi @HimanshuSinha-MSFT ,

thanks for asking.
According to the Microsoft support, when you use a Sink in a data flow with the staging option enabled, the component makes use of the copy command but in a different way as in the copy activity. The data flows are optimized for upserts scenarios and it worths to mention that we are doing full loads of the tables everytime, which is of course impacting the performance.
If I got ir correctly, the sink created a copy of the data in the data lake (staging) and the created temporary tables in the synapse SQL database and finally copying the values to the target table. It seems that for the copy activity this is not exactly the case and the step with the temporary tables is not required.

This would explain the different in the DWUcs consumption.

B.R.
Paul

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

HimanshuSinha-MSFT avatar image
0 Votes"
HimanshuSinha-MSFT answered HimanshuSinha-MSFT edited

Hello @PaulHernandez-8067 ,

Aplozoges for the delay in response from my side . In data flow when we have sink as Synapse with staging we fist copy the records to from the blob to a temp table and then from temp table to the target table . tghis may be the reason you are seeing a spike on the DWU consumption .

Thanks
Himanshu

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.