Connect to Azure SQL Database in Azure Purview

This article outlines the process to register an Azure SQL data source in Azure Purview including instructions to authenticate and interact with the Azure SQL database source

Supported capabilities

Metadata Extraction Full Scan Incremental Scan Scoped Scan Classification Access Policy Lineage
Yes Yes Yes Yes Yes No No**

** Lineage is supported if dataset is used as a source/sink in Data Factory Copy activity

Known limitations

  • Azure Purview doesn't support over 300 columns in the Schema tab and it will show "Additional-Columns-Truncated".

Prerequisites

Register

This section will enable you to register the Azure SQL DB data source and set up an appropriate authentication mechanism to ensure successful scanning of the data source.

Steps to register

It is important to register the data source in Azure Purview prior to setting up a scan for the data source.

  1. Go to the Azure portal, and navigate to the Purview accounts page and select your Purview account

    Screenshot that shows the Purview account used to register the data source

  2. Open Purview Studio and navigate to the Data Map

    Screenshot that navigates to the Sources link in the Data Map

  3. Create the Collection hierarchy using the Collections menu and assign permissions to individual subcollections, as required

    Screenshot that shows the collection menu to assign access control permissions to the collection hierarchy

  4. Navigate to the appropriate collection under the Sources menu and select the Register icon to register a new Azure SQL DB

    Screenshot that shows the collection used to register the data source

  5. Select the Azure SQL Database data source and select Continue

    Screenshot that allows selection of the data source

  6. Provide a suitable Name for the data source, select the relevant Azure subscription, Server name for the SQL server and the collection and select on Apply

    Screenshot that shows the details to be entered in order to register the data source

  7. The Azure SQL Server Database will be shown under the selected Collection

    Screenshot that shows the data source mapped to the collection to initiate scanning

Scan

Authentication for a scan

In order to have access to scan the data source, an authentication method in the Azure SQL Database needs to be configured. The following options are supported:

  • SQL Authentication

  • System-assigned managed identity - As soon as the Azure Purview account is created, a system-assigned managed identity (SAMI) is created automatically in Azure AD tenant, and has the same name as your Azure Purview account. Depending on the type of resource, specific RBAC role assignments are required for the Azure Purview SAMI to perform the scans.

  • User-assigned managed identity (preview) - Similar to a SAMI, a user-assigned managed identity (UAMI) is a credential resource that can be used to allow Azure Purview to authenticate against Azure Active Directory. Depending on the type of resource, specific RBAC role assignments are required when using a UAMI credential to perform scans.

  • Service Principal - In this method, you can create a new or use an existing service principal in your Azure Active Directory tenant.

Important

If you are using a self-hosted integration runtime to connect to your resource, system-assigned and user-assigned managed identities will not work. You need to use SQL Authentication or Service Principal Authentication.

Configure Azure AD authentication in the database account

The service principal or managed identity must have permission to get metadata for the database, schemas, and tables. It must also be able to query the tables to sample for classification.

Using SQL Authentication for scanning

Note

Only the server-level principal login (created by the provisioning process) or members of the loginmanager database role in the master database can create new logins. It takes about 15 minutes after granting permission, the Purview account should have the appropriate permissions to be able to scan the resource(s).

You can follow the instructions in CREATE LOGIN to create a login for Azure SQL Database if you don't have this login available. You will need username and password for the next steps.

  1. Navigate to your key vault in the Azure portal

    Screenshot that shows the key vault

  2. Select Settings > Secrets and select + Generate/Import

    Screenshot that shows the key vault option to generate a secret

  3. Enter the Name and Value as the password from your Azure SQL Database

    Screenshot that shows the key vault option to enter the sql secret values

  4. Select Create to complete

  5. If your key vault is not connected to Purview yet, you will need to create a new key vault connection

  6. Finally, create a new credential using the key to set up your scan

    Screenshot that shows the key vault option to set up credentials

    Screenshot that shows the key vault option to create a secret

Using a system or user assigned managed identity for scanning

Important

If you are using a self-hosted integration runtime to connect to your resource, system-assigned and user-assigned managed identities will not work. You need to use SQL Authentication or Service Principal Authentication.

It is important to give your Purview account's system managed identity or user-assigned managed identity the permission to scan the Azure SQL DB. You can add the SAMI or UAMI at the Subscription, Resource Group, or Resource level, depending on what you want it to have scan permissions on.

Note

You need to be an owner of the subscription to be able to add a managed identity on an Azure resource.

  1. From the Azure portal, find either the subscription, resource group, or resource (for example, an Azure SQL Database) that you would like to allow the catalog to scan.

  2. Select Access Control (IAM) in the left navigation and then select + Add --> Add role assignment

    Screenshot that shows the Azure SQL database

  3. Set the Role to Reader and enter your Azure Purview account name or user-assigned managed identity under Select input box. Then, select Save to give this role assignment to your Purview account.

    Screenshot that shows the details to assign permissions for the Purview account

Using Service Principal for scanning

Creating a new service principal

If you need to Create a new service principal, it is required to register an application in your Azure AD tenant and provide access to Service Principal in your data sources. Your Azure AD Global Administrator or other roles such as Application Administrator can perform this operation.

Getting the Service Principal's Application ID
  1. Copy the Application (client) ID present in the Overview of the Service Principal already created

    Screenshot that shows the Application (client) ID for the Service Principal

Granting the Service Principal access to your Azure SQL Database
  1. Navigate to your key vault in the Azure portal

    Screenshot that shows the key vault to add a secret for for Service Principal

  2. Select Settings > Secrets and select + Generate/Import

    Screenshot that shows the key vault option to generate a secret for Service Principal

  3. Enter the Name of your choice and Value as the Client secret from your Service Principal

    Screenshot that shows the key vault option to enter the secret values

  4. Select Create to complete

  5. If your key vault is not connected to Purview yet, you will need to create a new key vault connection

  6. Finally, create a new credential using the key to set up your scan

    Screenshot that shows the key vault option to add a credentials for Service Principal

    Screenshot that shows the key vault option to create a secret for Service Principal

Firewall settings

If your database server has a firewall enabled, you will need to update the firewall to allow access in one of two ways:

  1. Allow Azure connections through the firewall.
  2. Install a Self-Hosted Integration Runtime and give it access through the firewall.

Allow Azure Connections

Enabling Azure connections will allow Azure Purview to reach and connect the server without updating the firewall itself. You can follow the How-to guide for Connections from inside Azure.

  1. Navigate to your database account
  2. Select the server name in the Overview page
  3. Select Security > Firewalls and virtual networks
  4. Select Yes for Allow Azure services and resources to access this server Allow Azure services and resources to access this server.

Self-Hosted Integration Runtime

A self-hosted integration runtime (SHIR) can be installed on a machine to connect with a resource in a private network.

  1. Create and install a self-hosted integration runtime on a personal machine, or a machine inside the same VNet as your database server.
  2. Check your database server firewall to confirm that the SHIR machine has access through the firewall. Add the IP of the machine if it does not already have access.
  3. If your Azure SQL Server is behind a private endpoint or in a VNet, you can use an ingestion private endpoint to ensure end-to-end network isolation.

Creating the scan

  1. Open your Purview account and select the Open Purview Studio

  2. Navigate to the Data map --> Sources to view the collection hierarchy

  3. Select the New Scan icon under the Azure SQL DB registered earlier

    Screenshot that shows the screen to create a new scan

If using SQL Authentication

  1. Provide a Name for the scan, select Database selection method as Enter manually, enter the Database name and the Credential created earlier, choose the appropriate collection for the scan and select Test connection to validate the connection. Once the connection is successful, select Continue

    Screenshot that shows the SQL Authentication option for scanning

If using a system or user assigned managed identity

  1. Provide a Name for the scan, select the SAMI or UAMI under Credential, choose the appropriate collection for the scan

    Screenshot that shows the managed identity option to run the scan

  2. Select Test connection. On a successful connection, select Continue

    Screenshot that allows the managed identity option to run the scan

If using Service Principal

  1. Provide a Name for the scan, choose the appropriate collection for the scan, and select the Credential dropdown to select the credential created earlier.

    Screenshot that shows the option for service principal to enable scanning

  2. Select Test connection. On a successful connection, select Continue.

Scoping and running the scan

  1. You can scope your scan to specific folders and subfolders by choosing the appropriate items in the list.

    Scope your scan

  2. Then select a scan rule set. You can choose between the system default, existing custom rule sets, or create a new rule set inline.

    Scan rule set

  3. If creating a new scan rule set

    New Scan rule set

  4. You can select the classification rules to be included in the scan rule

    Scan rule set classification rules

    Scan rule set selection

  5. Choose your scan trigger. You can set up a schedule or run the scan once.

    scan trigger

  6. Review your scan and select Save and run.

    review scan

View Scan

  1. Navigate to the data source in the Collection and select View Details to check the status of the scan

    view scan

  2. The scan details indicate the progress of the scan in the Last run status and the number of assets scanned and classified

    view scan details

  3. The Last run status will be updated to In progress and then Completed once the entire scan has run successfully

    view scan completed

Manage Scan

Scans can be managed or run again on completion

  1. Select the Scan name to manage the scan

    manage scan

  2. You can run the scan again, edit the scan, delete the scan

    manage scan options

  3. You can run an incremental scan or a full scan again

    full or incremental scan

Next steps

Now that you have registered your source, follow the below guides to learn more about Purview and your data.