Connect to and manage Azure Synapse Analytics workspaces in Azure Purview

This article outlines how to register Azure Synapse Analytics workspaces and how to authenticate and interact with Azure Synapse Analytics workspaces in Azure Purview. For more information about Azure Purview, read the introductory article.

Supported capabilities

Metadata Extraction Full Scan Incremental Scan Scoped Scan Classification Access Policy Lineage
Yes Yes Yes Yes Yes No Yes- Synapse pipelines

Prerequisites

Register

This section describes how to register Azure Synapse Analytics workspaces in Azure Purview using the Purview Studio.

Authentication for registration

Only users with at least a Reader role on the Azure Synapse workspace who is also data source administrators in Azure Purview can register an Azure Synapse workspace.

Steps to register

  1. Go to your Azure Purview account.

  2. On the left pane, select Sources.

  3. Select Register.

  4. Under Register sources, select Azure Synapse Analytics (multiple).

  5. Select Continue.

    Screenshot of a selection of sources in Azure Purview, including Azure Synapse Analytics.

  6. On the Register sources (Azure Synapse Analytics) page, do the following:

    a. Enter a Name for the data source to be listed in the data catalog.
    b. Optionally, choose a subscription to filter down to.
    c. In the Workspace name dropdown list, select the workspace that you're working with.
    d. In the endpoints dropdown lists, the SQL endpoints are automatically filled in based on your workspace selection.
    e. In the Select a collection dropdown list, choose the collection you're working with or, optionally, create a new one.
    f. Select Register to finish registering the data source.

    Screenshot of the 'Register sources (Azure Synapse Analytics)' page for entering details about the Azure Synapse source.

Scan

Follow the steps below to scan Azure Synapse Analytics workspaces to automatically identify assets and classify your data. For more information about scanning in general, see our introduction to scans and ingestion.

You will first need to set up authentication for enumerating for either your dedicated or serverless resources. This will allow Purview to enumerate your workspace assets and perform scoped scans.

Then, you will need to apply permissions to scan the contents of the workspace.

Authentication for enumerating dedicated SQL database resources

  1. In the Azure portal, go to the Azure Synapse workspace resource.

  2. On the left pane, select Access Control (IAM).

    Note

    You must be an owner or user access administrator to add a role on the resource.

  3. Select the Add button.

  4. Set the Reader role and enter your Azure Purview account name, which represents its managed service identity (MSI).

  5. Select Save to finish assigning the role.

Note

If you're planning to register and scan multiple Azure Synapse workspaces in your Azure Purview account, you can also assign the role from a higher level, such as a resource group or a subscription.

Authentication for enumerating serverless SQL database resources

There are three places you will need to set authentication to allow Purview to enumerate your serverless SQL database resources: the Synapse workspace, the associated storage, and on the Serverless databases. The steps below will set permissions for all three.

  1. In the Azure portal, go to the Azure Synapse workspace resource.

  2. On the left pane, select Access Control (IAM).

    Note

    You must be an owner or user access administrator to add a role on the resource.

  3. Select the Add button.

  4. Set the Reader role and enter your Azure Purview account name, which represents its managed service identity (MSI).

  5. Select Save to finish assigning the role.

  6. In the Azure portal, go to the Resource group or Subscription that the Azure Synapse workspace is in.

  7. On the left pane, select Access Control (IAM).

    Note

    You must be an owner or user access administrator to add a role in the Resource group or Subscription fields.

  8. Select the Add button.

  9. Set the Storage blob data reader role and enter your Azure Purview account name (which represents its MSI) in the Select box.

  10. Select Save to finish assigning the role.

  11. Go to your Azure Synapse workspace and open the Synapse Studio.

  12. Select the Data tab on the left menu.

  13. Select the ellipsis (...) next to one of your databases, and then start a new SQL script.

  14. Add the Azure Purview account MSI (represented by the account name) on the serverless SQL databases. You do so by running the following command in your SQL script:

    CREATE LOGIN [PurviewAccountName] FROM EXTERNAL PROVIDER;
    

Apply permissions to scan the contents of the workspace

You can set up authentication for an Azure Synapse source in either of two ways:

  • Use a managed identity
  • Use a service principal

Important

These steps for serverless databases do not apply to replicated databases. Currently in Synapse, serverless databases that are replicated from Spark databases are read-only. For more information, go here.

Note

You must set up authentication on each dedicated SQL database in your Azure Synapse workspace that you intend to register and scan. The permissions that are mentioned in the following sections for serverless SQL database apply to all databases within your workspace. That is, you'll have to set up authentication only once.

Use a managed identity for dedicated SQL databases

  1. Go to your Azure Synapse workspace.

  2. Go to the Data section, and then look for one of your dedicated SQL databases.

  3. Select the ellipsis (...) next to it, and then start a new SQL script.

    Note

    To run the commands in the following procedure, you must be an Azure Synapse administrator on the workspace. For more information about Azure Synapse Analytics permissions, see: Set up access control for your Azure Synapse workspace.

  4. Add the Azure Purview account MSI (represented by the account name) as db_datareader on the dedicated SQL database. You do so by running the following command in your SQL script:

    CREATE USER [PurviewAccountName] FROM EXTERNAL PROVIDER
    GO
    
    EXEC sp_addrolemember 'db_datareader', [PurviewAccountName]
    GO
    

Use a managed identity for serverless SQL databases

  1. Go to your Azure Synapse workspace.
  2. Go to the Data section, and follow the next steps for each database you want to scan.
  3. Select the ellipsis (...) next to your database, and then start a new SQL script.
  4. Add the Azure Purview account MSI (represented by the account name) as db_datareader on the serverless SQL databases. You do so by running the following command in your SQL script:
    CREATE USER [PurviewAccountName] FOR LOGIN [PurviewAccountName];
    ALTER ROLE db_datareader ADD MEMBER [PurviewAccountName]; 
    

Grant permission to use credentials for external tables

If the Azure Synapse workspace has any external tables, the Azure Purview managed identity must be given References permission on the external table scoped credentials. With the References permission, Azure Purview can read data from external tables.

GRANT REFERENCES ON DATABASE SCOPED CREDENTIAL::[scoped_credential] TO [PurviewAccountName];

Use a service principal for dedicated SQL databases

Note

You must first set up a new credential of type Service Principal by following the instructions in Credentials for source authentication in Azure Purview.

  1. Go to your Azure Synapse workspace.

  2. Go to the Data section, and then look for one of your dedicated SQL databases.

  3. Select the ellipsis (...) next to it, and then start a new SQL script.

  4. Add the Service Principal ID as db_datareader on the dedicated SQL database. You do so by running the following command in your SQL script:

    CREATE USER [ServicePrincipalID] FROM EXTERNAL PROVIDER
    GO
    
    EXEC sp_addrolemember 'db_datareader', [ServicePrincipalID]
    GO
    

Note

Repeat the previous step for all dedicated SQL databases in your Synapse workspace.

Use a service principal for serverless SQL databases

  1. Go to your Azure Synapse workspace.

  2. Go to the Data section, and then look for one of your serverless SQL databases.

  3. Select the ellipsis (...) next to it, and then start a new SQL script.

  4. Add the Service Principal ID on the serverless SQL databases. You do so by running the following command in your SQL script:

    CREATE LOGIN [ServicePrincipalID] FROM EXTERNAL PROVIDER;
    
  5. Add Service Principal ID as db_datareader on each of the serverless SQL databases you want to scan. You do so by running the following command in your SQL script:

     CREATE USER [ServicePrincipalID] FOR LOGIN [ServicePrincipalID];
     ALTER ROLE db_datareader ADD MEMBER [ServicePrincipalID]; 
    

Set up Azure Synapse workspace firewall access

  1. In the Azure portal, go to the Azure Synapse workspace.

  2. On the left pane, select Firewalls.

  3. For Allow Azure services and resources to access this workspace control, select ON.

  4. Select Save.

Create and run scan

To create and run a new scan, do the following:

  1. Select the Data Map tab on the left pane in Purview Studio.

  2. Select the data source that you registered.

  3. Select View details, and then select New scan. Alternatively, you can select the Scan quick action icon on the source tile.

  4. On the Scan details pane, in the Name box, enter a name for the scan.

  5. In the Type dropdown list, select the types of resources that you want to scan within this source. SQL Database is the only type we currently support within an Azure Synapse workspace.

    Screenshot of the details pane for the Azure Synapse source scan.

  6. In the Credential dropdown list, select the credential to connect to the resources within your data source.

  7. Within each type, you can select to scan either all the resources or a subset of them by name.

  8. Select Continue to proceed.

  9. Select Scan rule sets of type Azure Synapse SQL. You can also create scan rule sets inline.

  10. Choose your scan trigger. You can schedule it to run weekly/monthly or once.

  11. Review your scan, and then select Save to complete the setup.

View your scans and scan runs

To view existing scans, do the following:

  1. Go to the Purview Studio. Select the Data Map tab under the left pane.

  2. Select the desired data source. You will see a list of existing scans on that data source under Recent scans, or can view all scans under the Scans tab.

  3. Select the scan that has results you want to view.

  4. This page will show you all of the previous scan runs along with the status and metrics for each scan run. It will also display whether your scan was scheduled or manual, how many assets had classifications applied, how many total assets were discovered, the start and end time of the scan, and the total scan duration.

Manage your scans - edit, delete, or cancel

To manage or delete a scan, do the following:

  1. Go to the Purview Studio. Select the Data Map tab under the left pane.

  2. Select the desired data source. You will see a list of existing scans on that data source under Recent scans, or can view all scans under the Scans tab.

  3. Select the scan you would like to manage. You can edit the scan by selecting Edit scan.

  4. You can cancel an in progress scan by selecting Cancel scan run.

  5. You can delete your scan by selecting Delete scan.

Note

  • Deleting your scan does not delete catalog assets created from previous scans.
  • The asset will no longer be updated with schema changes if your source table has changed and you re-scan the source table after editing the description in the schema tab of Purview.

Next steps

Now that you have registered your source, follow the below guides to learn more about Purview and your data.