AlonsoGarciaHector-9873 avatar image
0 Votes"
AlonsoGarciaHector-9873 asked AlonsoGarciaHector-9873 commented

Azure DataFactory: How order the execution Flow of one Dataflow

I want to know if it´s posible to order the execution of the different activities inside a dataflow.
For this example:

I want to assure that the execution of the “sink source 1” occurs before than the “lookup source 1” activity.
I know I can use “custom sink ordering” but I see that in some cases, the source 1 reading in the lookup activity executes before the sink of the source 1.
I also know that I can divide it into two dataflows to ensure that one runs before the other but I want to avoid doing this.

· 3
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hello @AlonsoGarciaHector-9873 and welcome to Microsoft Q&A.

If I understand your ask correctly, you wish for Sink1 to complete before the Lookup starts. Can you explain to me the significance of this ask, why it is important to complete the sink first?
The only instance where I can think it would make a difference, is if Source1 and Sink1 share a dataset or (the Sink1 overwrites the Source1). In such an instance, you could re-use the output of Join instead of re-reading.

0 Votes 0 ·

Thanks for the reply, wee need to do this cause source2 has a referential integrity with source1. Your solution is great but we cannot implement because we have more sources with more references each other.

We thought that using the custom sink order the dataflow will execute in the order that we need. What is the expected flow if we establish "1" for "sink source 1" and "2" for "sink source 2"?

0 Votes 0 ·

@AlonsoGarciaHector-9873 did Kiran's suggestion solve your issue? If so please mark as answer.

0 Votes 0 ·

1 Answer

Kiran-MSFT avatar image
0 Votes"
Kiran-MSFT answered AlonsoGarciaHector-9873 commented

For this to work you have to make a copy of source1 to join with source2. It is just a definition of another source transform. Anything that is a part of 1 connected graph will start to run together but will sink in the order specified in sink1 and sink2.

· 3
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

@Kiran-MSFT could you clarify a couple points please?

Does only 1 graph run at a time, no parallel-graphs?

I was worried that graph 2 could start before graph 1 finishes, and only the sink was garanteed order.

0 Votes 0 ·

Hi @MartinJaffer-MSFT, valid concern (@Kiran-MSFT can confirm). @AlonsoGarciaHector-9873: If I were you, I would do all the flows with first degree sink in one DataFlow, and do the second degree sink (which are depends on the first degree sink completion) in a separate DataFlow. For example,

in DataFlow1:
InitialSource1 ---> all the transformations --> SinkSource1
InitialSource2 ---> transformations --> SinkSource2
InitialSource3 ---> join with initial Source1 (not dependent on SinkSource1) --> transformations --> SinkSource3

in a Different DataFlow2:
Source2 ---> join with Source1 (must start to pull data after DF1) + all other transformations --> Source2

Thanks! :)

1 Vote 1 ·

It worked like a charm! Thanks!

0 Votes 0 ·