Connect to and manage multiple Azure sources in Azure Purview
This article outlines how to register multiple Azure sources and how to authenticate and interact with them in Azure Purview. For more information about Azure Purview, read the introductory article.
Supported capabilities
| Metadata Extraction | Full Scan | Incremental Scan | Scoped Scan | Classification | Access Policy | Lineage |
|---|---|---|---|---|---|---|
| Yes | Yes | Yes | Yes | Yes | No | Source Dependant |
Prerequisites
An Azure account with an active subscription. Create an account for free.
An active Purview resource.
You will need to be a Data Source Administrator and Data Reader to register a source and manage it in the Purview Studio. See our Azure Purview Permissions page for details.
Register
This section describes how to register multiple Azure sources in Azure Purview using the Purview Studio.
Prerequisites for registration
You need to set up some authentication to be able to enumerate resources under a subscription or resource group.
- Go to the subscription or the resource group in the Azure portal.
- Select Access Control (IAM) from the left menu.
- Select +Add.
- In the Select input box, select the Reader role and enter your Azure Purview account name (which represents its MSI file name).
- Select Save to finish the role assignment.
Authentication for registration
There are two ways to set up authentication for multiple sources in Azure:
- Managed identity
- Service principal
You must set up authentication on each resource within your subscription or resource group that you want to register and scan. Azure Storage resource types (Azure Blob Storage and Azure Data Lake Storage Gen2) make it easy by allowing you to add the MSI file or service principal at the subscription or resource group level as a storage blob data reader. The permissions then trickle down to each storage account within that subscription or resource group. For all other resource types, you must apply the MSI file or service principal on each resource, or create a script to do so.
To learn how to add permissions on each resource type within a subscription or resource group, see the following resources:
- Azure Blob Storage
- Azure Data Lake Storage Gen1
- Azure Data Lake Storage Gen2
- Azure SQL Database
- Azure SQL Managed Instance
- Azure Synapse Analytics
Steps to register
Go to your Azure Purview account.
Select Data Map on the left menu.
Select Register.
On Register sources, select Azure (multiple).
Select Continue.
On the Register sources (Azure) screen, do the following:
In the Name box, enter a name that the data source will be listed with in the catalog.
In the Management group box, optionally choose a management group to filter down to.
In the Subscription and Resource group dropdown list boxes, select a subscription or a specific resource group, respectively. The registration scope will be set to the selected subscription or resource group.
In the Select a collection box, select a collection or create a new one (optional).
Select Register to register the data sources.
Scan
Follow the steps below to scan multiple Azure sources to automatically identify assets and classify your data. For more information about scanning in general, see our introduction to scans and ingestion.
Create and run scan
To create and run a new scan, do the following:
Select the Data Map tab on the left pane in the Purview Studio.
Select the data source that you registered.
Select View details > + New scan, or use the Scan quick-action icon on the source tile.
For Name, fill in the name.
For Type, select the types of resources that you want to scan within this source. Choose one of these options:
- Leave it as All. This selection includes future resource types that might not currently exist within that subscription or resource group.
- Use the boxes to specifically select resource types that you want to scan. If you choose this option, future resource types that might be created within this subscription or resource group won't be included for scans, unless the scan is explicitly edited in the future.
Select the credential to connect to the resources within your data source:
- You can select a credential at the parent level as an MSI file, or you can select a credential for a particular service principal type. You can then use that credential for all the resource types under the subscription or resource group.
- You can specifically select the resource type and apply a different credential for that resource type.
Each credential will be considered as the method of authentication for all the resources under a particular type. You must set the chosen credential on the resources in order to successfully scan them, as described earlier in this article.
Within each type, you can select to either scan all the resources or scan a subset of them by name:
- If you leave the option as All, then future resources of that type will also be scanned in future scan runs.
- If you select specific storage accounts or SQL databases, then future resources of that type created within this subscription or resource group will not be included for scans, unless the scan is explicitly edited in the future.
Select Test connection. This will first test access to check if you've applied the Azure Purview MSI file as a reader on the subscription or resource group. If you get an error message, follow these instructions to resolve it. Then it will test your authentication and connection to each of your selected sources and generate a report. The number of sources selected will impact the time it takes to generate this report. If failed on some resources, hovering over the X icon will display the detailed error message.
After you test connection has passed, select Continue to proceed.
Select scan rule sets for each resource type that you chose in the previous step. You can also create scan rule sets inline.
Choose your scan trigger. You can schedule it to run weekly, monthly, or once.
Review your scan and select Save to complete setup.
View your scans and scan runs
View source details by selecting View details on the tile under the Data Map section.
View scan run details by going to the Scan details page.
The status bar is a brief summary of the running status of the child resources. It's displayed on the subscription level or resource group level. The colors have the following meanings:
- Green: The scan was successful.
- Red: The scan failed.
- Gray: The scan is still in progress.
You can select each scan to view finer details.
View a summary of recent failed scan runs at the bottom of the source details. You can also view more granular details about these runs.
Manage your scans: edit, delete, or cancel
To manage a scan, do the following:
Go to the management center.
Select Data sources under the Sources and scanning section, and then select the desired data source.
Select the scan that you want to manage. Then:
- You can edit the scan by selecting Edit.
- You can delete the scan by selecting Delete.
- If the scan is running, you can cancel it by selecting Cancel.
Next steps
Now that you have registered your source, follow the below guides to learn more about Purview and your data.