Azure Data Lake Storage Gen2 Graph connector
The Azure Data Lake Storage Gen2 Graph connector allows users in your organization to search for files stored in Azure Blob Storage and Azure Data Lake Gen 2 Storage accounts.
Note
Read the Setup your Graph connector article to understand the general Graph connectors setup process.
This article is for anyone who configures, runs, and monitors an Azure Data Lake Storage Gen2 connector. It supplements the general setup process, and shows instructions that apply only for the Azure Data Lake Storage Gen2 connector. This article also includes information about Limitations.
In the article, we use Azure Storage as a generic term for Azure Blob Storage and Azure Data Lake Gen 2 Storage.
Step 1: Add a Graph connector in the Microsoft 365 admin center
Follow the general setup instructions.
Step 2: Name the connection
Follow the general setup instructions.
Step 3: Configure the connection settings
Enter your Primary storage connection String. This string is required to allow access to your storage account. To find your connection string, go to the Azure portal and navigate to the Keys section of your relevant Azure Storage account.
If you prefer not to provide the AccountKey (a parameter in the primary storage connection string), grant access to our Graph Connectors Service for the following roles:
- Storage Blob Data Reader
- Storage Queue Data Contributor
- Storage Blob Delegator
Navigate to the Access Control tab of your Azure Storage account, and follow the instructions there to grant access to the following app:
- First Party App ID: 56c1da01-2129-48f7-9355-af6d59d42766
- First Party App Name: Graph Connector Service
Storage account and queue notifications (Optional)
Support to process changes in real time in the Graph Connectors Service might be added in the future. In that case, we'll monitor Azure Storage change notifications stored in a queue. You'll need to create a queue in the same account as your Azure Storage account.
After you create a queue, go to the Events tab on the queue page to configure Event Subscription. Choose all the Blob events that the queue will receive, and connect the queue to the Azure Storage account.
Step 4: Assign property labels
You can assign a source property to each label by choosing from a menu of options. While this step isn't mandatory, having some property labels will improve the search relevance and ensure better search results for end users.
Step 5: Manage schema
On the Manage Schema screen, you can change the schema attributes associated with the properties, the options are Query, Search, Retrieve, and Refine. You also can add optional aliases, and choose the Content property.
Step 6: Manage search permissions
Azure Data Lake Gen 2
You can choose to ingest the Access Control Lists (ACLs) from your Azure Data Lake Gen 2 Storage account. When these search permissions are set, search content is trimmed based on the permissions of the user signed in Azure Active Directory. Alternatively, you can choose to make all the content indexed from your storage account visible to everyone in your organization. In this case, everyone in your organization will have access to all the data in your storage account.
The Azure Data Lake Storage Gen2 Graph connector supports search permissions visible to Everyone, or Only people with access to this data source. Indexed data that appears in the search results could be visible to users in the organization who have access to each item.
Azure Blob Storage
For a connection to Azure Blob Storage, all the content indexed from the configured source is visible to everyone in your organization. Access control lists aren't supported at Blob level in Azure Blob Storage.
Step 7: Set the refresh schedule
On the Refresh Settings screen, you can set the incremental crawl interval and the full crawl interval. The default intervals for the Azure Data Lake Storage Gen2 connector are 15 minutes for an incremental crawl and one week for a full crawl.
Step 8: Review connection
Follow the general setup instructions.
Limitations
A published connection for Azure Blob Storage cannot be reconfigured for Azure Data Lake Storage Gen2 source and the other way around. In such scenarios, it's recommended to configure a new connection.
Also, the size of the files needs to be 4 MB or less for it to be crawled. File types currently supported are:
- Word (docx, .docm, .dotx, .dotm)
- PowerPoint (.pptm, .pptx, .potm, .potx, .ppam, .ppsm, .ppsx)
- Excel (.xlsx, .xlsm)
- Legacy Office formats (.doc, .dot, etc.)
- Text (.txt)
- HTML
Binary files like images (.jpg, .bmp, etc.) are not supported. For example, if a .docx file contains only images, it might be skipped because it didn't return any content.