How to get lineage from Azure Synapse Analytics into Microsoft Purview
This document explains the steps required for connecting an Azure Synapse workspace with a Microsoft Purview account to track data lineage. The document also gets into the details of the coverage scope and supported lineage capabilities.
Supported Azure Synapse capabilities
Currently, Microsoft Purview captures runtime lineage from the following Azure Synapse pipeline activities:
Important
Microsoft Purview drops lineage if the source or destination uses an unsupported data storage system.
Copy activity support
Data store | Supported |
---|---|
Azure Blob Storage | Yes |
Azure Cognitive Search | Yes |
Azure Cosmos DB (SQL API) * | Yes |
Azure Cosmos DB's API for MongoDB * | Yes |
Azure Data Explorer * | Yes |
Azure Data Lake Storage Gen1 | Yes |
Azure Data Lake Storage Gen2 | Yes |
Azure Database for MariaDB * | Yes |
Azure Database for MySQL * | Yes |
Azure Database for PostgreSQL * | Yes |
Azure Files | Yes |
Azure SQL Database * | Yes |
Azure SQL Managed Instance * | Yes |
Azure Synapse Analytics * | Yes |
Azure Dedicated SQL pool (formerly SQL DW) * | Yes |
Azure Table Storage | Yes |
Amazon S3 | Yes |
Hive * | Yes |
Oracle * | Yes |
SAP Table (when connecting to SAP ECC or SAP S/4HANA) | Yes |
SQL Server * | Yes |
Teradata * | Yes |
* Microsoft Purview currently doesn't support query or stored procedure for lineage or scanning. Lineage is limited to table and view sources only.
If you use Self-hosted Integration Runtime, note the minimal version with lineage support for:
- Any use case: version 5.9.7885.3 or later
- Copying data from Oracle: version 5.10 or later
- Copying data into Azure Synapse Analytics via COPY command or PolyBase: version 5.10 or later
Limitations on copy activity lineage
Currently, if you use the following copy activity features, the lineage is not yet supported:
- Copy data into Azure Data Lake Storage Gen1 using Binary format.
- Compression setting for Binary, delimited text, Excel, JSON, and XML files.
- Source partition options for Azure SQL Database, Azure SQL Managed Instance, Azure Synapse Analytics, SQL Server, and SAP Table.
- Copy data to file-based sink with setting of max rows per file.
In additional to lineage, the data asset schema (shown in Asset -> Schema tab) is reported for the following connectors:
- CSV and Parquet files on Azure Blob, Azure Files, ADLS Gen1, ADLS Gen2, and Amazon S3
- Azure Data Explorer, Azure SQL Database, Azure SQL Managed Instance, Azure Synapse Analytics, SQL Server, Teradata
Data Flow support
Data store | Supported |
---|---|
Azure Blob Storage | Yes |
Azure Cosmos DB (SQL API) * | Yes |
Azure Data Lake Storage Gen1 | Yes |
Azure Data Lake Storage Gen2 | Yes |
Azure Database for MySQL * | Yes |
Azure Database for PostgreSQL * | Yes |
Azure SQL Database * | Yes |
Azure SQL Managed Instance * | Yes |
Azure Synapse Analytics * | Yes |
Azure Dedicated SQL pool (formerly SQL DW) * | Yes |
* Microsoft Purview currently doesn't support query or stored procedure for lineage or scanning. Lineage is limited to table and view sources only.
Limitations on data flow lineage
Currently, data flow lineage doesn't integrate with Microsoft Purview resource set.
Access secured Microsoft Purview account
If your Microsoft Purview account is protected by firewall, learn how to let Azure Synapse access a secured Microsoft Purview account through Microsoft Purview private endpoints.
Bring Azure Synapse lineage into Microsoft Purview
Step 1: Connect Azure Synapse workspace to your Microsoft Purview account
You can connect an Azure Synapse workspace to Microsoft Purview, and the connection enables Azure Synapse to push lineage information to Microsoft Purview. Follow the steps in Connect Synapse workspace to Microsoft Purview. Multiple Azure Synapse workspaces can connect to a single Microsoft Purview account for holistic lineage tracking.
Step 2: Run pipeline in Azure Synapse workspace
You can create pipelines with Copy activity in Azure Synapse workspace. You don't need any additional configuration for lineage data capture. The lineage data will automatically be captured during the activities execution.
Step 3: Monitor lineage reporting status
After you run the Azure Synapse pipeline, in the Synapse pipeline monitoring view, you can check the lineage reporting status by selecting the following Lineage status button. The same information is also available in the activity output JSON -> reportLineageToPurvew
section.
Step 4: View lineage information in your Microsoft Purview account
In your Microsoft Purview account, you can browse assets and choose type "Azure Synapse Analytics". You can also search the Data Catalog using keywords.
Select the Synapse account -> pipeline -> activity, you can view the lineage information.
Next steps
Feedback
Submit and view feedback for