Connect to Azure Cosmos DB for SQL API in Microsoft Purview

This article outlines the process to register and scan Azure Cosmos DB for SQL API instance in Microsoft Purview, including instructions to authenticate and interact with the Azure Cosmos DB database source

Supported capabilities

Metadata Extraction Full Scan Incremental Scan Scoped Scan Classification Labeling Access Policy Lineage Data Sharing Live view
Yes Yes No Yes Yes Yes No No** No No

** Lineage is supported if dataset is used as a source/sink in Data Factory Copy activity

Prerequisites

Register

This section will enable you to register the Azure Cosmos DB for SQL API instance and set up an appropriate authentication mechanism to ensure successful scanning of the data source.

Steps to register

It is important to register the data source in Microsoft Purview prior to setting up a scan for the data source.

  1. Open the Microsoft Purview governance portal by:

  2. Navigate to the Data Map --> Collections

    Screenshot that navigates to the Sources link in the Data Map

  3. Create the Collection hierarchy using the Collections menu and assign permissions to individual subcollections, as required

    Screenshot that shows the collection menu to create collection hierarchy

  4. Navigate to the appropriate collection under the Sources menu and select the Register icon to register a new Azure Cosmos DB database

    Screenshot that shows the collection used to register the data source

  5. Select the Azure Cosmos DB for SQL API data source and select Continue

    Screenshot that allows selection of the data source

  6. Provide a suitable Name for the data source, select the relevant Azure subscription, Cosmos DB account name and the collection and select Apply

    Screenshot that shows the details to be entered in order to register the data source

  7. The Azure Cosmos DB database storage account will be shown under the selected Collection

    Screenshot that shows the data source mapped to the collection to initiate scanning

Scan

Authentication for a scan

In order to have access to scan the data source, an authentication method in the Azure Cosmos DB database Storage account needs to be configured.

There is only one way to set up authentication for Azure Cosmos DB Database:

Account Key - Secrets can be created inside an Azure Key Vault to store credentials in order to enable access for Microsoft Purview to scan data sources securely using the secrets. A secret can be a storage account key, SQL login password or a password.

Note

You need to deploy an Azure key vault resource in your subscription and assign Microsoft Purview account’s MSI with required access permission to secrets inside Azure key vault.

Using Account Key for scanning

You need to get your access key and store in the key vault:

  1. Navigate to your Azure Cosmos DB database storage account

  2. Select Settings > Keys

    Screenshot that shows the access keys in the storage account

  3. Copy your key and save it separately for the next steps

    Screenshot that shows the access keys to be copied

  4. Navigate to your key vault

    Screenshot that shows the key vault

  5. Select Settings > Secrets and select + Generate/Import

    Screenshot that shows the key vault option to generate a secret

  6. Enter the Name and Value as the key from your storage account and Select Create to complete

    Screenshot that shows the key vault option to enter the secret values

  7. If your key vault is not connected to Microsoft Purview yet, you will need to create a new key vault connection

  8. Finally, create a new credential using the key to set up your scan.

Creating the scan

  1. Open your Microsoft Purview account and select the Open Microsoft Purview governance portal

  2. Navigate to the Data map --> Sources to view the collection hierarchy

  3. Select the New Scan icon under the Azure Cosmos database registered earlier

    Screenshot that shows the screen to create a new scan

  4. Provide a Name for the scan.

  5. Choose either the Azure integration runtime if your source is publicly accessible, a managed virtual network integration runtime if using a managed virtual network, or a self-hosted integration runtime if your source is in a private virtual network. For more information about which integration runtime to use, see the choose the right integration runtime configuration article.

  6. Choose the appropriate collection for the scan and select + New under Credential

    Screenshot that shows the Account Key option for scanning

  7. Select the appropriate Key vault connection and the Secret name that was used while creating the Account Key. Choose Authentication method as Account Key

    Screenshot that shows the account key options

  8. Select Test connection. On a successful connection, select Continue

    Screenshot that shows Test Connection success

Scoping and running the scan

  1. You can scope your scan to specific folders and subfolders by choosing the appropriate items in the list.

    Scope your scan

  2. Then select a scan rule set. You can choose between the system default, existing custom rule sets, or create a new rule set inline.

    Scan rule set

    New Scan rule

  3. You can select the classification rules to be included in the scan rule

    Scan rule set classification rules

    Scan rule set selection

  4. Choose your scan trigger. You can set up a schedule or run the scan once.

    scan trigger

  5. Review your scan and select Save and run.

    review scan

Viewing Scan

  1. Navigate to the data source in the Collection and select View Details to check the status of the scan

    view scan

  2. The scan details indicate the progress of the scan in the Last run status and the number of assets scanned and classified

    view scan details

  3. The Last run status will be updated to In progress and then Completed once the entire scan has run successfully

    view scan in progress

    view scan completed

Managing Scan

Scans can be managed or run again on completion.

  1. Select the Scan name to manage the scan

    manage scan

  2. You can run the scan again, edit the scan, delete the scan

    manage scan options

  3. You can run a Full Scan again

    full scan

Next steps

Now that you have registered your source, follow the below guides to learn more about Microsoft Purview and your data.