Register and scan Azure Synapse Analytics workspaces

This article outlines how to register an Azure Synapse Analytics workspace in Azure Purview and set up a scan on it.

Supported capabilities

Azure Synapse Analytics workspace scans support capturing metadata and schema for the dedicated and serverless SQL databases within them. The workspace scans also classify the data automatically, based on system and custom classification rules.

Prerequisites

  • Before you register your data sources, create an Azure Purview account. For more information, see Quickstart: Create an Azure Purview account.
  • You must be an Azure Purview data source administrator.
  • Set up authentication as described in the following sections.

Register and scan an Azure Synapse workspace

Important

To scan your workspace successfully, follow the steps and apply the permissions exactly as they're outlined in the next sections.

Step 1: Register your source

Note

Only users with at least a Reader role on the Azure Synapse workspace who are also data source administrators in Azure Purview can perform this step.

To register a new Azure Synapse Source in your data catalog, do the following:

  1. Go to your Azure Purview account.

  2. On the left pane, select Sources.

  3. Select Register.

  4. Under Register sources, select Azure Synapse Analytics (multiple).

  5. Select Continue.

    Screenshot of a selection of sources in Azure Purview, including Azure Synapse Analytics.

  6. On the Register sources (Azure Synapse Analytics) page, do the following:

    a. Enter a Name for the data source to be listed in the data catalog.
    b. Optionally, choose a subscription to filter down to.
    c. In the Workspace name dropdown list, select the workspace that you're working with.
    d. In the endpoints dropdown lists, the SQL endpoints are automatically filled in based on your workspace selection.
    e. In the Select a collection dropdown list, choose the collection you're working with or, optionally, create a new one.
    f. Select Register to finish registering the data source.

    Screenshot of the 'Register sources (Azure Synapse Analytics)' page for entering details about the Azure Synapse source.

Step 2: Apply permissions to enumerate the contents of the Azure Synapse workspace

Set up authentication for enumerating dedicated SQL database resources

  1. In the Azure portal, go to the Azure Synapse workspace resource.

  2. On the left pane, select Access Control (IAM).

    Note

    You must be an owner or user access administrator to add a role on the resource.

  3. Select the Add button.

  4. Set the Reader role and enter your Azure Purview account name, which represents its managed service identity (MSI).

  5. Select Save to finish assigning the role.

Note

If you're planning to register and scan multiple Azure Synapse workspaces in your Azure Purview account, you can also assign the role from a higher level, such as a resource group or a subscription.

Set up authentication for enumerating serverless SQL database resources

There are three places you will need to set authentication to allow Purview to enumerate your serverless SQL database resources: the Synapse workspace, the associated storage, and on the Serverless databases. The steps below will set permissions for all three.

  1. In the Azure portal, go to the Azure Synapse workspace resource.

  2. On the left pane, select Access Control (IAM).

    Note

    You must be an owner or user access administrator to add a role on the resource.

  3. Select the Add button.

  4. Set the Reader role and enter your Azure Purview account name, which represents its managed service identity (MSI).

  5. Select Save to finish assigning the role.

  6. In the Azure portal, go to the Resource group or Subscription that the Azure Synapse workspace is in.

  7. On the left pane, select Access Control (IAM).

    Note

    You must be an owner or user access administrator to add a role in the Resource group or Subscription fields.

  8. Select the Add button.

  9. Set the Storage blob data reader role and enter your Azure Purview account name (which represents its MSI) in the Select box.

  10. Select Save to finish assigning the role.

  11. Go to your Azure Synapse workspace and open the Synapse Studio.

  12. Select the Data tab on the left menu.

  13. Select the ellipsis (...) next to one of your databases, and then start a new SQL script.

  14. Add the Azure Purview account MSI (represented by the account name) on the serverless SQL databases. You do so by running the following command in your SQL script:

    CREATE LOGIN [PurviewAccountName] FROM EXTERNAL PROVIDER;
    

Step 3: Apply permissions to scan the contents of the workspace

You can set up authentication for an Azure Synapse source in either of two ways:

  • Use a managed identity
  • Use a service principal

Important

These steps for serverless databases do not apply to replicated databases. Currently in Synapse, serverless databases that are replicated from Spark databases are read-only. For more information, go here.

Note

You must set up authentication on each dedicated SQL database in your Azure Synapse workspace that you intend to register and scan. The permissions that are mentioned in the following sections for serverless SQL database apply to all databases within your workspace. That is, you'll have to set up authentication only once.

Use a managed identity for dedicated SQL databases

  1. Go to your Azure Synapse workspace.

  2. Go to the Data section, and then look for one of your dedicated SQL databases.

  3. Select the ellipsis (...) next to it, and then start a new SQL script.

    Note

    To run the commands in the following procedure, you must be an Azure Synapse administrator on the workspace. For more information about Azure Synapse Analytics permissions, see: Set up access control for your Azure Synapse workspace.

  4. Add the Azure Purview account MSI (represented by the account name) as db_datareader on the dedicated SQL database. You do so by running the following command in your SQL script:

    CREATE USER [PurviewAccountName] FROM EXTERNAL PROVIDER
    GO
    
    EXEC sp_addrolemember 'db_datareader', [PurviewAccountName]
    GO
    

Use a managed identity for serverless SQL databases

  1. Go to your Azure Synapse workspace.
  2. Go to the Data section, and follow the next steps for each database you want to scan.
  3. Select the ellipsis (...) next to your database, and then start a new SQL script.
  4. Add the Azure Purview account MSI (represented by the account name) as db_datareader on the serverless SQL databases. You do so by running the following command in your SQL script:
    CREATE USER [PurviewAccountName] FOR LOGIN [PurviewAccountName];
    ALTER ROLE db_datareader ADD MEMBER [PurviewAccountName]; 
    

Grant permission to use credentials for external tables

If the Azure Synapse workspace has any external tables, the Azure Purview managed identity must be given References permission on the external table scoped credentials. With the References permission, Azure Purview can read data from external tables.

GRANT REFERENCES ON DATABASE SCOPED CREDENTIAL::[scoped_credential] TO [PurviewAccountName];

Use a service principal for dedicated SQL databases

Note

You must first set up a new credential of type Service Principal by following the instructions in Credentials for source authentication in Azure Purview.

  1. Go to your Azure Synapse workspace.

  2. Go to the Data section, and then look for one of your dedicated SQL databases.

  3. Select the ellipsis (...) next to it, and then start a new SQL script.

  4. Add the Service Principal ID as db_datareader on the dedicated SQL database. You do so by running the following command in your SQL script:

    CREATE USER [ServicePrincipalID] FROM EXTERNAL PROVIDER
    GO
    
    EXEC sp_addrolemember 'db_datareader', [ServicePrincipalID]
    GO
    

Note

Repeat the previous step for all dedicated SQL databases in your Synapse workspace.

Use a service principal for serverless SQL databases

  1. Go to your Azure Synapse workspace.

  2. Go to the Data section, and then look for one of your serverless SQL databases.

  3. Select the ellipsis (...) next to it, and then start a new SQL script.

  4. Add the Service Principal ID on the serverless SQL databases. You do so by running the following command in your SQL script:

    CREATE LOGIN [ServicePrincipalID] FROM EXTERNAL PROVIDER;
    
  5. Add Service Principal ID as db_datareader on each of the serverless SQL databases you want to scan. You do so by running the following command in your SQL script:

     CREATE USER [ServicePrincipalID] FOR LOGIN [ServicePrincipalID];
     ALTER ROLE db_datareader ADD MEMBER [ServicePrincipalID]; 
    

Step 4: Set up Azure Synapse workspace firewall access

  1. In the Azure portal, go to the Azure Synapse workspace.

  2. On the left pane, select Firewalls.

  3. For Allow Azure services and resources to access this workspace control, select ON.

  4. Select Save.

Step 5: Set up a scan on the workspace

To create and run a new scan, do the following:

  1. Select the Data Map tab on the left pane in Purview Studio.

  2. Select the data source that you registered.

  3. Select View details, and then select New scan. Alternatively, you can select the Scan quick action icon on the source tile.

  4. On the Scan details pane, in the Name box, enter a name for the scan.

  5. In the Type dropdown list, select the types of resources that you want to scan within this source. SQL Database is the only type we currently support within an Azure Synapse workspace.

    Screenshot of the details pane for the Azure Synapse source scan.

  6. In the Credential dropdown list, select the credential to connect to the resources within your data source.

  7. Within each type, you can select to scan either all the resources or a subset of them by name.

  8. Select Continue to proceed.

  9. Select Scan rule sets of type Azure Synapse SQL. You can also create scan rule sets inline.

  10. Choose your scan trigger. You can schedule it to run weekly/monthly or once.

  11. Review your scan, and then select Save to complete the setup.

View your scans and scan runs

  1. View source details by selecting view details on the tile under the sources section.

    Screenshot of the Azure Synapse Analytics source details page.

  2. View scan run details by going to the scan details page.

    • The status bar displays a brief summary of the running status of the child resources. The status is displayed on the workspace level scan.

    • Green indicates a successful scan run, red indicates a failed scan run, and gray indicates that the scan run is still in progress.

    • You can view more granular information about the scan runs by clicking into them.

      Screenshot of the Azure Synapse Analytics scan details page.

    • You can view a summary of recent failed scan runs at the bottom of the source details page. Again, you can view more granular information about the scan runs by clicking into them.

Manage your scans

To edit, delete, or cancel a scan, do the following:

  1. Go to the management center. In the Sources and scanning section, select Data sources, and then select the data source you want to manage.

  2. Select the scan you want to manage, and then select Edit.

    • To delete your scan, select Delete.
    • If a scan is currently running, you can cancel it.

Next steps