question

SateeshBattu-5298 avatar image
0 Votes"
SateeshBattu-5298 asked SateeshBattu-5298 commented

How to get the source/sink output stream name / dataset name in a data flow in data factory from programetic way ( JAVA/Python/Powershell/Cli)

In a pipeline, data flow has source and sink.

My requirement is to get the data set that is used in source and sink . I need this in JAVA SDK but python/powershell/az cli is also fine.

The box highlighted in blue details need to fetch from command line. Either of the pipeline-run by factory on pipeline show commands are not helpful to get the data flow source and sink details .

196626-mssupport.png


azure-data-factory
mssupport.png (24.3 KiB)
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

1 Answer

MartinJaffer-MSFT avatar image
1 Vote"
MartinJaffer-MSFT answered SateeshBattu-5298 commented

Hello @SateeshBattu-5298 and welcome to Microsoft Q&A.

It sounds like you want to retrieve the output stream/dataset from a given run or definition?
Since the stream is defined in the Data Flow definition, and the Dataset defined in the pipeline definition, these are not dynamic values. This confuses me as to what your end goal is. By knowing which Data Flow or Pipeline was used, you know which Dataset was used. Also, knowing the stream name doesn't yield any actionable details. It isn't like you could hook up another service to the output stream.

But if you really wanted to, I suppose first you would take the pipeline run id, and get the run details. From that you get the pipeline name. With the pipeline name you fetch the pipeline definition. Inside the pipeline definition find the relevant copy activity. Under that it is outputs[0].referenceName

For the dataflow, once located in the pipeline definition, get the Data flow name under typeProperties.dataflow.referenceName. With this name fetch the data flow definition. There look under properties.typeProperties.sinks[x].name



· 5
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hi @MartinJaffer-MSFT , Thanks for your reply. My end goal is to capture the list of data sets ( sources and sink ) that were part of the data flow activity and capture them in the custom logs. I will try this approach and let you know if this suffice my requirement.

0 Votes 0 ·

@MartinJaffer-MSFT This is what I am looking for, this solution suffice my requirement. I accept this solution.

We can close this thread.

0 Votes 0 ·

Okay, thanks for letting me know. I have converted from comment to answer. If you could mark as accepted answer, that would be great, @SateeshBattu-5298 !

0 Votes 0 ·

Oh there is one thing you need to be mindful o, @SateeshBattu-5298 f:

The situation in which you run a pipeline, then afterwards update it and run again. The API's I referenced fetch whatever the CURRENTLY PUBLISHED version is. This means both the pre-update and post-update runs will point to the post-update version of the pipeline.

The good news, is I recall seeing a timestamp of last update somewhere. So you can put some logic in to say whether the run was pre-update or post-update.

0 Votes 0 ·

Sure, I will make a note of it, thanks for letting me know about this.

0 Votes 0 ·