Azure Data Lake Storage Gen2 connector
This article is for Microsoft 365 administrators or anyone who configures, runs, and monitors an Azure Data Lake Storage Gen2 connector. It gives an overview of the connector configuration, capabilities, limitations, and troubleshooting techniques. In the article, we use Azure Storage as a generic term for Azure Blob Storage and Azure Data Lake Gen 2 Storage.
Connect to a data source
Primary storage connection string
On the Authentication and config screen, provide the Primary Storage Connection String. That string is required to allow access to your storage account. To find your connection string, go to the Azure portal and navigate to the Keys section of your relevant Azure Storage account. Copy and paste the connection string in the appropriate field on the screen.
If you do not prefer to provide the AccountKey (a parameter in the primary storage connection string), you will need to grant access to our Graph Connectors Service for the following roles.
- Storage Blob Data Reader
- Storage Queue Data Contributor
- Storage Blob Delegator (only for hierarchical storage)
Navigate to the Access Control tab of your Azure Storage account, and follow the instructions there to grant access to the following app:
- First Party App ID: 56c1da01-2129-48f7-9355-af6d59d42766
- First Party App Name: Graph Connector Service
Storage account and queue notifications (Optional)
Support to process changes in real time in the Graph Connectors Service might be added in the future. In that case, we'll monitor Azure Storage change notifications stored in a queue. You'll need to create a queue in the same account as your Azure Storage account.
After you create a queue, go to the Events tab on the queue page to configure Event Subscription. Choose all the Blob events that the queue will receive, and connect the queue to the Azure Storage account.
Manage search permissions
Azure Data Lake Gen 2
On the Manage search permissions screen, you can choose to ingest the Access Control Lists (ACLs) from your Azure Data Lake Gen 2 Storage account. When these search permissions are set, search content is trimmed based on the permissions assigned to the signed-in Azure Active Directory user searching the content. Alternatively, you can choose to make all the content indexed from your storage account visible to everyone in your organization. In this case, everyone in your organization will have access to all the data in your storage account.
Azure Blob Storage
For a connection to Azure Blob Storage, all the content indexed from the configured source is visible to everyone in your organization. Access control lists are not supported at Blob level in Azure Blob Storage.
Manage search permissions
The Azure Data Lake Storage Gen2 connector supports search permissions visible to Everyone or Only people with access to this data source. Indexed data that appears in the search results could be visible to all users in the organization or only to users who have access to each item.
Assign property labels
You can assign a source property to each label by choosing from a menu of options. While this step is not mandatory, having some property labels will improve the search relevance and ensure more accurate search results for end users.
On the Manage Schema screen, you have the option to change the schema attributes (queryable, searchable, retrievable, and refinable) associated with the properties, add optional aliases, and choose the Content property.
Set the refresh schedule
On the Refresh Settings screen, you can set the incremental crawl interval and the full crawl interval. The default intervals for the Azure Data Lake Storage Gen2 connector are 15 minutes for an incremental crawl and one week for a full crawl.
A published connection for Azure Blob Storage cannot be reconfigured for Azure Data Lake Storage Gen2 source and vice-versa. In such scenarios, it is recommended to configure a new connection.