How to get lineage from Azure Synapse Analytics into Azure Purview

This document explains the steps required for connecting an Azure Synapse workspace with an Azure Purview account to track data lineage. The document also gets into the details of the coverage scope and supported lineage capabilities.

Supported Azure Synapse capabilities

Currently, Azure Purview captures runtime lineage from the following Azure Synapse pipeline activities:

Important

Azure Purview drops lineage if the source or destination uses an unsupported data storage system.

Copy activity support

Data store Supported
Azure Blob Storage Yes
Azure Cognitive Search Yes
Azure Cosmos DB (SQL API) * Yes
Azure Cosmos DB's API for MongoDB * Yes
Azure Data Explorer * Yes
Azure Data Lake Storage Gen1 Yes
Azure Data Lake Storage Gen2 Yes
Azure Database for Maria DB * Yes
Azure Database for MySQL * Yes
Azure Database for PostgreSQL * Yes
Azure Files Yes
Azure SQL Database * Yes
Azure SQL Managed Instance * Yes
Azure Synapse Analytics * Yes
Azure Table Storage Yes
Amazon S3 Yes
Hive * Yes
SAP Table (when connecting to SAP ECC or SAP S/4HANA) Yes
SQL Server * Yes
Teradata * Yes

* Azure Purview currently doesn't support query or stored procedure for lineage or scanning. Lineage is limited to table and view sources only.

Known limitations on copy activity lineage

Currently, if you use the following copy activity features, the lineage is not yet supported:

  • Copy data into Azure Data Lake Storage Gen1 using Binary format.
  • Copy data into Azure Synapse Analytics using PolyBase or COPY statement.
  • Compression setting for Binary, delimited text, Excel, JSON, and XML files.
  • Source partition options for Azure SQL Database, Azure SQL Managed Instance, Azure Synapse Analytics, SQL Server, and SAP Table.
  • Copy data to file-based sink with setting of max rows per file.

In additional to lineage, the data asset schema (shown in Asset -> Schema tab) is reported for the following connectors:

  • CSV and Parquet files on Azure Blob, Azure Files, ADLS Gen1, ADLS Gen2, and Amazon S3
  • Azure Data Explorer, Azure SQL Database, Azure SQL Managed Instance, Azure Synapse Analytics, SQL Server, Teradata

Data Flow support

Data store Supported
Azure Blob Storage Yes
Azure Cosmos DB (SQL API) * Yes
Azure Data Lake Storage Gen1 Yes
Azure Data Lake Storage Gen2 Yes
Azure Database for MySQL * Yes
Azure Database for PostgreSQL * Yes
Azure SQL Database * Yes
Azure SQL Managed Instance * Yes
Azure Synapse Analytics * Yes

* Azure Purview currently doesn't support query or stored procedure for lineage or scanning. Lineage is limited to table and view sources only.

Access secured Azure Purview account

If your Purview account is protected by firewall, learn how to let Azure Synapse access a secured Purview account through Purview private endpoints.

Bring Azure Synapse lineage into Purview

Step 1: Connect Azure Synapse workspace to your Purview account

You can connect an Azure Synapse workspace to Purview, and the connection enables Azure Synapse to push lineage information to Purview. Follow the steps in Connect Synapse workspace to Azure Purview. Multiple Azure Synapse workspaces can connect to a single Azure Purview account for holistic lineage tracking.

Step 2: Run pipeline in Azure Synapse workspace

You can create pipelines with Copy activity in Azure Synapse workspace. You don't need any additional configuration for lineage data capture. The lineage data will automatically be captured during the activities execution.

Step 3: Monitor lineage reporting status

After you run the Azure Synapse pipeline, in the Synapse pipeline monitoring view, you can check the lineage reporting status by clicking the following Lineage status button. The same information is also available in the activity output JSON -> reportLineageToPurvew section.

Monitor the lineage reporting status in pipeline monitoring view.

Step 4: View lineage information in your Purview account

In your Purview account, you can browse assets and choose type "Azure Synapse Analytics". You can also search the Data Catalog using keywords.

Browse the Azure Synapse assets in Purview.

Select the Synapse account -> pipeline -> activity, you can view the lineage information.

Browse the Azure Synapse pipeline lineage in Purview.

Next steps

Catalog lineage user guide

Link to Azure Data Share for lineage