Connect to Azure SQL Database in Microsoft Purview

This article outlines the process to register an Azure SQL data source in Microsoft Purview including instructions to authenticate and interact with the Azure SQL database source

Supported capabilities

Metadata Extraction Full Scan Incremental Scan Scoped Scan Classification Access Policy Lineage
Yes Yes Yes Yes Yes Yes (Preview) Yes(Preview)**

** Lineage is also supported if Azure SQL tables/views used as source/sink in Data Factory Copy and Data Flow activities

  • Data lineage extraction is currently supported only for Stored procedure runs

When scanning Azure SQL Database, Microsoft Purview supports:

  • Extracting technical metadata including:

    • Server
    • Database
    • Schemas
    • Tables including the columns
    • Views including the columns
    • Store procedures (with lineage extraction enabled)
    • Store procedure runs (with lineage extraction enabled)

When setting up scan, you can further scope the scan after providing the database name by selecting tables and views as needed.

Known limitations

  • Microsoft Purview doesn't support over 300 columns in the Schema tab and it will show "Additional-Columns-Truncated" if there are more than 300 columns.
  • Column level lineage is currently not supported in the lineage tab. However, the columnMapping attribute in properties tab of Azure SQL Stored Procedure Run captures column lineage in plain text.
  • Stored procedures running remotely from data integration tools like Azure Data Factory is currently not supported
  • Data lineage extraction is currently not supported for Functions, Triggers.
  • Lineage extraction scan is scheduled and defaulted to run every six hours. Frequency can't be changed.
  • If sql views are referenced in stored procedures, they're captured as sql tables currently.
  • Lineage extraction is currently not supported, if Azure SQL Server is configured behind a private endpoint.

Prerequisites

Register

This section will enable you to register the Azure SQL DB data source and set up authentication to scan.

Steps to register

It's important to register the data source in Microsoft Purview before setting up a scan.

  1. Go to the Azure portal, and navigate to the Microsoft Purview accounts page and select your Purview account

  2. Open Microsoft Purview governance portal and navigate to the Data Map

    Screenshot that navigates to the Sources link in the Data Map.

  3. Create the Collection hierarchy using the Collections menu and assign permissions to individual subcollections, as required

    Screenshot that shows the collection menu to assign access control permissions to the collection hierarchy.

  4. Navigate to the appropriate collection under the Sources menu and select the Register icon to register a new Azure SQL DB

    Screenshot that shows the collection used to register the data source.

  5. Select the Azure SQL Database data source and select Continue

  6. Provide a suitable Name for the data source, select the relevant Azure subscription, Server name for the SQL server and the collection and select on Apply

    Screenshot that shows the details to be entered in order to register the data source.

  7. The Azure SQL Server Database will be shown under the selected Collection

    Screenshot that shows the data source mapped to the collection to initiate scanning.

Scan

Tip

To troubleshoot any issues with scanning:

  1. Confirm you have followed all prerequisites.
  2. Check network by confirming firewall, Azure connections, or integration runtime settings.
  3. Confirm authentication is properly set up.
  4. Review our scan troubleshooting documentation.

Firewall settings

If your database server has a firewall enabled, you'll need to update the firewall to allow access in one of two ways:

  1. Allow Azure connections through the firewall - a straightforward option to route traffic through Azure networking, without needing to manage virtual machines.
  2. Install a Self-Hosted Integration Runtime on a machine in your network and give it access through the firewall - if you have a private VNet set up within Azure, or have any other closed network set up, using a self-hosted integration runtime on a machine within that network will allow you to fully manage traffic flow and utilize your existing network.
  3. Use a managed virtual network - setting up a managed virtual network with your Microsoft Purview account will allow you to connect to Azure SQL using the Azure integration runtime in a closed network.

For more information about the Azure SQL Firewall, see the SQL Database firewall documentation. To connect Microsoft Purview through the firewall, follow the steps below.

Allow Azure Connections

Enabling Azure connections will allow Microsoft Purview to reach and connect the server without updating the firewall itself. You can follow the How-to guide for Connections from inside Azure.

  1. Navigate to your database account
  2. Select the server name in the Overview page
  3. Select Security > Firewalls and virtual networks
  4. Select Yes for Allow Azure services and resources to access this server Allow Azure services and resources to access this server.

Self-Hosted Integration Runtime

A self-hosted integration runtime (SHIR) can be installed on a machine to connect with a resource in a private network.

  1. Create and install a self-hosted integration runtime on a personal machine, or a machine inside the same VNet as your database server.
  2. Check your database server networking configuration to confirm that there's a private endpoint accessible to the SHIR machine. Add the IP of the machine if it doesn't already have access.
  3. If your Azure SQL Server is behind a private endpoint or in a VNet, you can use an ingestion private endpoint to ensure end-to-end network isolation.

Authentication for a scan

To scan your data source, you'll need to configure an authentication method in the Azure SQL Database.

Important

If you are using a self-hosted integration runtime to connect to your resource, system-assigned and user-assigned managed identities will not work. You need to use Service Principal authentication or SQL authentication.

The following options are supported:

  • System-assigned managed identity (Recommended) - This is an identity associated directly with your Microsoft Purview account that allows you to authenticate directly with other Azure resources without needing to manage a go-between user or credential set. The system-assigned managed identity is created when your Microsoft Purview resource is created, is managed by Azure, and uses your Microsoft Purview account's name. The SAMI can't currently be used with a self-hosted integration runtime for Azure SQL. For more information, see the managed identity overview.

  • User-assigned managed identity (preview) - Similar to a SAMI, a user-assigned managed identity (UAMI) is a credential resource that allows Microsoft Purview to authenticate against Azure Active Directory. The user-assigned managed by users in Azure, rather than by Azure itself, which gives you more control over security. The UAMI can't currently be used with a self-hosted integration runtime for Azure SQL. For more information, see our guide for user-assigned managed identities.

  • Service Principal- A service principal is an application that can be assigned permissions like any other group or user, without being associated directly with a person. Their authentication has an expiration date, and so can be useful for temporary projects. For more information, see the service principal documentation.

  • SQL Authentication - connect to the SQL database with a username and password. For more information about SQL Authentication, you can follow the SQL authentication documentation. If you need to create a login, follow this guide to query an Azure SQL database, and use this guide to create a login using T-SQL.

    Note

    Be sure to select the Azure SQL Database option on the page.

Select your chosen method of authentication from the tabs below for steps to authenticate with your Azure SQL Database.

Note

Only the server-level principal login (created by the provisioning process) or members of the loginmanager database role in the master database can create new logins. It takes about 15 minutes after granting permission, the Microsoft Purview account should have the appropriate permissions to be able to scan the resource(s).

  1. You'll need a SQL login with at least db_datareader permissions to be able to access the information Microsoft Purview needs to scan the database. You can follow the instructions in CREATE LOGIN to create a sign-in for Azure SQL Database. You'll need to save the username and password for the next steps.

  2. Navigate to your key vault in the Azure portal.

  3. Select Settings > Secrets and select + Generate/Import

    Screenshot that shows the key vault option to generate a secret.

  4. Enter the Name and Value as the password from your Azure SQL Database.

  5. Select Create to complete

  6. If your key vault isn't connected to Microsoft Purview yet, you'll need to create a new key vault connection

  7. Finally, create a new credential using the key to set up your scan.

    Screenshot that shows the key vault option to set up credentials.

    Screenshot that shows the key vault option to create a secret.

Creating the scan

  1. Open your Microsoft Purview account and select the Open Microsoft Purview governance portal

  2. Navigate to the Data map --> Sources to view the collection hierarchy

  3. Select the New Scan icon under the Azure SQL DB registered earlier

    Screenshot that shows the screen to create a new scan.

Navigate to lineage section to learn more about data lineage from Azure SQL DB

Select your method of authentication from the tabs below for scanning steps.

  1. Provide a Name for the scan, select Database selection method as Enter manually, enter the Database name and the Credential created earlier, choose the appropriate collection for the scan and select Test connection to validate the connection. Once the connection is successful, select Continue

    Screenshot that shows the SQL Authentication option for scanning.

Scoping and running the scan

  1. You can scope your scan to specific database objects by choosing the appropriate items in the list.

    Scope your scan.

  2. Then select a scan rule set. You can choose between the system default, existing custom rule sets, or create a new rule set inline.

    Scan rule set.

  3. If creating a new scan rule set

    New Scan rule set.

  4. You can select the classification rules to be included in the scan rule

    Scan rule set classification rules.

    Scan rule set selection.

  5. Choose your scan trigger. You can set up a schedule or run the scan once.

  6. Review your scan and select Save and run.

View Scan

  1. Navigate to the data source in the Collection and select View Details to check the status of the scan

    view scan.

  2. The scan details indicate the progress of the scan in the Last run status and the number of assets scanned and classified

  3. The Last run status will be updated to In progress and then Completed once the entire scan has run successfully

    view scan completed.

Manage Scan

Scans can be managed or run again on completion

  1. Select the Scan name to manage the scan

    manage scan.

  2. You can run the scan again, edit the scan, delete the scan

    manage scan options.

  3. You can run an incremental scan or a full scan again

    full or incremental scan.

Access policy

To create an access policy for Azure Data Lake Storage Gen 2, follow these guides:

  • Single SQL account - This guide will allow you to enable access policies on a single Azure SQL Database account in your subscription.
  • All sources in a subscription or resource group - This guide will allow you to enable access policies on all enabled and available sources in a resource group, or across an Azure subscription.

Lineage (Preview)

Microsoft Purview supports lineage from Azure SQL Database. At the time of setting up a scan, enable lineage extraction toggle button to extract lineage.

Prerequisites for setting up scan with Lineage extraction

  1. Follow steps under authentication for a scan using Managed Identity section to authorize Microsoft Purview scan your Azure SQL Database

  2. Sign in to Azure SQL Database with Azure AD account and assign db_owner permissions to the Microsoft Purview Managed identity. Use below example SQL syntax to create user and grant permission by replacing 'purview-account' with your Account name:

    Create user <purview-account> FROM EXTERNAL PROVIDER
    GO
    EXEC sp_addrolemember 'db_owner', <purview-account> 
    GO
    
  3. Run below command on your Azure SQL Database to create master Key

    Create master key
    Go
    

Creating scan with lineage extraction toggle turned on

  1. Enable lineage extraction toggle in the scan screen

    Screenshot that shows the screen to create a new scan with lineage extraction.

  2. Select your method of authentication by following steps in the scan section

  3. Once the scan is successfully set up from previous step, a new scan type called Lineage extraction will run incremental scans every 6 hours to extract lineage from Azure SQL Database. Lineage is extracted based on the actual stored procedure runs in the Azure SQL Database

    Screenshot that shows the screen that runs lineage extraction every 6 hours.

Search Azure SQL Database assets and view runtime lineage

You can browse data catalog or search data catalog to view asset details for Azure SQL Database. The following steps describe how-to view runtime lineage details.

  1. Go to asset -> lineage tab, you can see the asset lineage when applicable. Refer to the supported capabilities section on the supported Azure SQL Database lineage scenarios. For more information about lineage in general, see data lineage and lineage user guide.

    Screenshot that shows the screen with lineage from stored procedures.

  2. Go to stored procedure asset -> Properties -> Related assets to see the latest run details of stored procedures.

    Screenshot that shows the screen with stored procedure properties containing runs.

  3. Select the stored procedure hyperlink next to Runs to see Azure SQL Stored Procedure Run Overview. Go to properties tab to see enhanced run time information from stored procedure. For example: executedTime, rowcount, Client Connection, and so on.

    Screenshot that shows the screen with stored procedure run properties.

Troubleshooting steps

  • If no lineage is captured after a successful Lineage extraction run, it's possible that no stored procedures have run at least once since the scan is set up.
  • Lineage is captured for stored procedure runs that happened after a successful scan is set up. Lineage from past Stored procedure runs isn't captured.
  • If your database is processing heavy workloads with lots of stored procedure runs, lineage extraction will filter only the most recent runs. Stored procedure runs early in the 6 hour window or the run instances that create heavy query load won't be extracted. Contact support if you're missing lineage from any stored procedure runs.

Next steps

Now that you've registered your source, follow the below guides to learn more about Microsoft Purview and your data.