Connect to Azure Blob storage in Azure Purview
This article outlines the process to register an Azure Blob Storage account in Azure Purview including instructions to authenticate and interact with the Azure Blob Storage source
Supported capabilities
| Metadata Extraction | Full Scan | Incremental Scan | Scoped Scan | Classification | Access Policy | Lineage |
|---|---|---|---|---|---|---|
| Yes | Yes | Yes | Yes | Yes | Yes | Limited** |
** Lineage is supported if dataset is used as a source/sink in Data Factory Copy activity
For file types such as csv, tsv, psv, ssv, the schema is extracted when the following logics are in place:
- First row values are non-empty
- First row values are unique
- First row values are not a date or a number
Prerequisites
An Azure account with an active subscription. Create an account for free.
An active Purview resource.
You will need to be a Data Source Administrator and Data Reader to register a source and manage it in the Purview Studio. See our Azure Purview Permissions page for details.
Register
This section will enable you to register the Azure Blob storage account and set up an appropriate authentication mechanism to ensure successful scanning of the data source.
Steps to register
It is important to register the data source in Azure Purview prior to setting up a scan for the data source.
Go to the Azure portal, and navigate to the Purview accounts page and select your Purview account
Open Purview Studio and navigate to the Data Map --> Sources
Create the Collection hierarchy using the Collections menu and assign permissions to individual subcollections, as required
Navigate to the appropriate collection under the Sources menu and select the Register icon to register a new Azure Blob data source
Select the Azure Blob Storage data source and select Continue
Provide a suitable Name for the data source, select the relevant Azure subscription, existing Azure Blob Storage account name and the collection and select Apply
The Azure Blob storage account will be shown under the selected Collection
Scan
Authentication for a scan
In order to have access to scan the data source, an authentication method in the Azure Blob Storage account needs to be configured.
The following options are supported:
Note
If you have firewall enabled for the storage account, you must use managed identity authentication method when setting up a scan.
System-assigned managed identity (Recommended) - As soon as the Azure Purview Account is created, a system-assigned managed identity (SAMI) is created automatically in Azure AD tenant. Depending on the type of resource, specific RBAC role assignments are required for the Azure Purview SAMI to perform the scans.
User-assigned managed identity (preview) - Similar to a system managed identity, a user-assigned managed identity (UAMI) is a credential resource that can be used to allow Azure Purview to authenticate against Azure Active Directory. For more information, you can see our User-assigned managed identity guide.
Account Key - Secrets can be created inside an Azure Key Vault to store credentials in order to enable access for Azure Purview to scan data sources securely using the secrets. A secret can be a storage account key, SQL login password, or a password.
Note
If you use this option, you need to deploy an Azure key vault resource in your subscription and assign Azure Purview account’s SAMI with required access permission to secrets inside Azure key vault.
Service Principal - In this method, you can create a new or use an existing service principal in your Azure Active Directory tenant.
Using a system or user assigned managed identity for scanning
It is important to give your Purview account the permission to scan the Azure Blob data source. You can add access for the SAMI or UAMI at the Subscription, Resource Group, or Resource level, depending on what level scan permission is needed.
Note
If you have firewall enabled for the storage account, you must use managed identity authentication method when setting up a scan.
Note
You need to be an owner of the subscription to be able to add a managed identity on an Azure resource.
From the Azure portal, find either the subscription, resource group, or resource (for example, an Azure Blob storage account) that you would like to allow the catalog to scan.
Select Access Control (IAM) in the left navigation and then select + Add --> Add role assignment
Set the Role to Storage Blob Data Reader and enter your Azure Purview account name or user-assigned managed identity under Select input box. Then, select Save to give this role assignment to your Purview account.
Go into your Azure Blob storage account in Azure portal
Navigate to Security + networking > Networking
Choose Selected Networks under Allow access from
In the Exceptions section, select Allow trusted Microsoft services to access this storage account and hit Save
Note
For more details, please see steps in Authorize access to blobs and queues using Azure Active Directory
Using Account Key for scanning
When authentication method selected is Account Key, you need to get your access key and store in the key vault:
Navigate to your Azure Blob storage account
Select Security + networking > Access keys
Copy your key and save it separately for the next steps
Navigate to your key vault
Select Settings > Secrets and select + Generate/Import
Enter the Name and Value as the key from your storage account
Select Create to complete
If your key vault is not connected to Purview yet, you will need to create a new key vault connection
Finally, create a new credential using the key to set up your scan
Using Service Principal for scanning
Creating a new service principal
If you need to Create a new service principal, it is required to register an application in your Azure AD tenant and provide access to Service Principal in your data sources. Your Azure AD Global Administrator or other roles such as Application Administrator can perform this operation.
Getting the Service Principal's Application ID
Copy the Application (client) ID present in the Overview of the Service Principal already created
Granting the Service Principal access to your Azure Blob account
It is important to give your service principal the permission to scan the Azure Blob data source. You can add access for the service principal at the Subscription, Resource Group, or Resource level, depending on what level scan access is needed.
Note
You need to be an owner of the subscription to be able to add a service principal on an Azure resource.
From the Azure portal, find either the subscription, resource group, or resource (for example, an Azure Blob Storage storage account) that you would like to allow the catalog to scan.
Select Access Control (IAM) in the left navigation and then select + Add --> Add role assignment
Set the Role to Storage Blob Data Reader and enter your service principal under Select input box. Then, select Save to give this role assignment to your Purview account.
Creating the scan
Open your Purview account and select the Open Purview Studio
Navigate to the Data map --> Sources to view the collection hierarchy
Select the New Scan icon under the Azure Blob data source registered earlier
If using a system or user assigned managed identity
Provide a Name for the scan, select the Purview accounts SAMI or UAMI under Credential, choose the appropriate collection for the scan, and select Test connection. On a successful connection, select Continue
If using Account Key
Provide a Name for the scan, choose the appropriate collection for the scan, and select Authentication method as Account Key and select Create
If using Service Principal
Provide a Name for the scan, choose the appropriate collection for the scan, and select the + New under Credential
Select the appropriate Key vault connection and the Secret name that was used while creating the Service Principal. The Service Principal ID is the Application (client) ID copied earlier
Select Test connection. On a successful connection, select Continue
Scoping and running the scan
You can scope your scan to specific folders and subfolders by choosing the appropriate items in the list.
Then select a scan rule set. You can choose between the system default, existing custom rule sets, or create a new rule set inline.
If creating a new scan rule set, select the file types to be included in the scan rule.
You can select the classification rules to be included in the scan rule
Choose your scan trigger. You can set up a schedule or run the scan once.
Review your scan and select Save and run.
Viewing Scan
Navigate to the data source in the Collection and select View Details to check the status of the scan
The scan details indicate the progress of the scan in the Last run status and the number of assets scanned and classified
The Last run status will be updated to In progress and then Completed once the entire scan has run successfully
Managing Scan
Scans can be managed or run again on completion
Select the Scan name to manage the scan
You can run the scan again, edit the scan, delete the scan
You can run an incremental scan or a full scan again
Access policy
Supported regions
Azure Purview (management side)
The Purview access policies capability is available in all Azure Purview regions
Azure Storage (enforcement side)
Purview access policies can only be enforced in the following Azure Storage regions
- France Central
- Canada Central
Enable access policy enforcement for the Azure Storage account
The following PowerShell commands need to be executed in the subscription where the Azure Storage account resides. This will cover all Azure Storage accounts in that subscription.
# Install the Az module
Install-Module -Name Az -Scope CurrentUser -Repository PSGallery -Force
# Login into the subscription
Connect-AzAccount -Subscription <SubscriptionID>
# Register the feature
Register-AzProviderFeature -FeatureName AllowPurviewPolicyEnforcement -ProviderNamespace Microsoft.Storage
If the output of the last command shows value of “RegistrationState” as “Registered”, then your subscription is enabled for this functionality.
Follow this configuration guide to enable access policies on an Azure Storage account
Next steps
Now that you have registered your source, follow the below guides to learn more about Purview and your data.