Programmatically monitor an Azure Data Factory

APPLIES TO: Azure Data Factory Azure Synapse Analytics

Tip

Try out Data Factory in Microsoft Fabric, an all-in-one analytics solution for enterprises. Microsoft Fabric covers everything from data movement to data science, real-time analytics, business intelligence, and reporting. Learn how to start a new trial for free!

This article describes how to monitor a pipeline in a data factory by using different software development kits (SDKs).

Note

We recommend that you use the Azure Az PowerShell module to interact with Azure. See Install Azure PowerShell to get started. To learn how to migrate to the Az PowerShell module, see Migrate Azure PowerShell from AzureRM to Az.

Data range

Data Factory only stores pipeline run data for 45 days. When you query programmatically for data about Data Factory pipeline runs - for example, with the PowerShell command Get-AzDataFactoryV2PipelineRun - there are no maximum dates for the optional LastUpdatedAfter and LastUpdatedBefore parameters. But if you query for data for the past year, for example, you won't get an error but only pipeline run data from the last 45 days.

If you want to keep pipeline run data for more than 45 days, set up your own diagnostic logging with Azure Monitor.

Pipeline run information

For pipeline run properties, refer to PipelineRun API reference. A pipeline run has different status during its lifecycle, the possible values of run status are listed below:

  • Queued
  • InProgress
  • Succeeded
  • Failed
  • Canceling
  • Cancelled

.NET

For a complete walk-through of creating and monitoring a pipeline using .NET SDK, see Create a data factory and pipeline using .NET.

  1. Add the following code to continuously check the status of the pipeline run until it finishes copying the data.

    // Monitor the pipeline run
    Console.WriteLine("Checking pipeline run status...");
    PipelineRun pipelineRun;
    while (true)
    {
        pipelineRun = client.PipelineRuns.Get(resourceGroup, dataFactoryName, runResponse.RunId);
        Console.WriteLine("Status: " + pipelineRun.Status);
        if (pipelineRun.Status == "InProgress" || pipelineRun.Status == "Queued")
            System.Threading.Thread.Sleep(15000);
        else
            break;
    }
    
  2. Add the following code to that retrieves copy activity run details, for example, size of the data read/written.

    // Check the copy activity run details
    Console.WriteLine("Checking copy activity run details...");
    
    RunFilterParameters filterParams = new RunFilterParameters(
        DateTime.UtcNow.AddMinutes(-10), DateTime.UtcNow.AddMinutes(10));
    ActivityRunsQueryResponse queryResponse = client.ActivityRuns.QueryByPipelineRun(
        resourceGroup, dataFactoryName, runResponse.RunId, filterParams);
    if (pipelineRun.Status == "Succeeded")
        Console.WriteLine(queryResponse.Value.First().Output);
    else
        Console.WriteLine(queryResponse.Value.First().Error);
    Console.WriteLine("\nPress any key to exit...");
    Console.ReadKey();
    

For complete documentation on .NET SDK, see Data Factory .NET SDK reference.

Python

For a complete walk-through of creating and monitoring a pipeline using Python SDK, see Create a data factory and pipeline using Python.

To monitor the pipeline run, add the following code:

# Monitor the pipeline run
time.sleep(30)
pipeline_run = adf_client.pipeline_runs.get(
    rg_name, df_name, run_response.run_id)
print("\n\tPipeline run status: {}".format(pipeline_run.status))
filter_params = RunFilterParameters(
    last_updated_after=datetime.now() - timedelta(1), last_updated_before=datetime.now() + timedelta(1))
query_response = adf_client.activity_runs.query_by_pipeline_run(
    rg_name, df_name, pipeline_run.run_id, filter_params)
print_activity_run_details(query_response.value[0])

For complete documentation on Python SDK, see Data Factory Python SDK reference.

REST API

For a complete walk-through of creating and monitoring a pipeline using REST API, see Create a data factory and pipeline using REST API.

  1. Run the following script to continuously check the pipeline run status until it finishes copying the data.

    $request = "https://management.azure.com/subscriptions/${subsId}/resourceGroups/${resourceGroup}/providers/Microsoft.DataFactory/factories/${dataFactoryName}/pipelineruns/${runId}?api-version=${apiVersion}"
    while ($True) {
        $response = Invoke-RestMethod -Method GET -Uri $request -Header $authHeader
        Write-Host  "Pipeline run status: " $response.Status -foregroundcolor "Yellow"
    
        if ( ($response.Status -eq "InProgress") -or ($response.Status -eq "Queued") ) {
            Start-Sleep -Seconds 15
        }
        else {
            $response | ConvertTo-Json
            break
        }
    }
    
  2. Run the following script to retrieve copy activity run details, for example, size of the data read/written.

    $request = "https://management.azure.com/subscriptions/${subscriptionId}/resourceGroups/${resourceGroupName}/providers/Microsoft.DataFactory/factories/${factoryName}/pipelineruns/${runId}/queryActivityruns?api-version=${apiVersion}&startTime="+(Get-Date).ToString('yyyy-MM-dd')+"&endTime="+(Get-Date).AddDays(1).ToString('yyyy-MM-dd')+"&pipelineName=Adfv2QuickStartPipeline"
    $response = Invoke-RestMethod -Method POST -Uri $request -Header $authHeader
    $response | ConvertTo-Json
    

For complete documentation on REST API, see Data Factory REST API reference.

PowerShell

For a complete walk-through of creating and monitoring a pipeline using PowerShell, see Create a data factory and pipeline using PowerShell.

  1. Run the following script to continuously check the pipeline run status until it finishes copying the data.

    while ($True) {
        $run = Get-AzDataFactoryV2PipelineRun -ResourceGroupName $resourceGroupName -DataFactoryName $DataFactoryName -PipelineRunId $runId
    
        if ($run) {
            if ( ($run.Status -ne "InProgress") -and ($run.Status -ne "Queued") ) {
                Write-Output ("Pipeline run finished. The status is: " +  $run.Status)
                $run
                break
            }
            Write-Output ("Pipeline is running...status: " + $run.Status)
        }
    
        Start-Sleep -Seconds 30
    }
    
  2. Run the following script to retrieve copy activity run details, for example, size of the data read/written.

    Write-Host "Activity run details:" -foregroundcolor "Yellow"
    $result = Get-AzDataFactoryV2ActivityRun -DataFactoryName $dataFactoryName -ResourceGroupName $resourceGroupName -PipelineRunId $runId -RunStartedAfter (Get-Date).AddMinutes(-30) -RunStartedBefore (Get-Date).AddMinutes(30)
    $result
    
    Write-Host "Activity 'Output' section:" -foregroundcolor "Yellow"
    $result.Output -join "`r`n"
    
    Write-Host "\nActivity 'Error' section:" -foregroundcolor "Yellow"
    $result.Error -join "`r`n"
    

For complete documentation on PowerShell cmdlets, see Data Factory PowerShell cmdlet reference.

See Monitor pipelines using Azure Monitor article to learn about using Azure Monitor to monitor Data Factory pipelines.