Azure Data Lake Storage Gen2 Graph connector

The Azure Data Lake Storage Gen2 Graph connector allows users in your organization to search for files stored in Azure Blob Storage and Azure Data Lake Gen 2 Storage accounts.

Note

Read the Setup your Graph connector article to understand the general Graph connectors setup process.

This article is for anyone who configures, runs, and monitors an Azure Data Lake Storage Gen2 connector. It supplements the general setup process, and shows instructions that apply only for the Azure Data Lake Storage Gen2 connector. This article also includes information about Limitations.

In the article, we use Azure Storage as a generic term for Azure Blob Storage and Azure Data Lake Gen 2 Storage.

Step 1: Add a Graph connector in the Microsoft 365 admin center

Follow the general setup instructions.

Step 2: Name the connection

Follow the general setup instructions.

Step 3: Configure the connection settings

Enter your Primary storage connection String. This string is required to allow access to your storage account. To find your connection string, go to the Azure portal and navigate to the Keys section of your relevant Azure Storage account.

If you prefer not to provide the AccountKey (a parameter in the primary storage connection string), grant access to our Graph Connectors Service for the following roles:

  • Storage Blob Data Reader
  • Storage Queue Data Contributor
  • Storage Blob Delegator

Navigate to the Access Control tab of your Azure Storage account, and follow the instructions there to grant access to the following app:

  • First Party App ID: 56c1da01-2129-48f7-9355-af6d59d42766
  • First Party App Name: Graph Connector Service

Storage account and queue notifications (Optional)

Support to process changes in real time in the Graph Connectors Service might be added in the future. In that case, we'll monitor Azure Storage change notifications stored in a queue. You'll need to create a queue in the same account as your Azure Storage account.

After you create a queue, go to the Events tab on the queue page to configure Event Subscription. Choose all the Blob events that the queue will receive, and connect the queue to the Azure Storage account.

Step 4: Assign property labels

You can assign a source property to each label by choosing from a menu of options. While this step isn't mandatory, having some property labels will improve the search relevance and ensure better search results for end users.

Step 5: Manage schema

On the Manage Schema screen, you can change the schema attributes associated with the properties, the options are Query, Search, Retrieve, and Refine. You also can add optional aliases, and choose the Content property.

Step 6: Manage search permissions

Azure Data Lake Gen 2

You can choose to ingest the Access Control Lists (ACLs) from your Azure Data Lake Gen 2 Storage account. When these search permissions are set, search content is trimmed based on the permissions of the user signed in Azure Active Directory. Alternatively, you can choose to make all the content indexed from your storage account visible to everyone in your organization. In this case, everyone in your organization will have access to all the data in your storage account.

The Azure Data Lake Storage Gen2 Graph connector supports search permissions visible to Everyone, or Only people with access to this data source. Indexed data that appears in the search results could be visible to users in the organization who have access to each item.

Azure Blob Storage

For a connection to Azure Blob Storage, all the content indexed from the configured source is visible to everyone in your organization. Access control lists aren't supported at Blob level in Azure Blob Storage.

Step 7: Set the refresh schedule

On the Refresh Settings screen, you can set the incremental crawl interval and the full crawl interval. The default intervals for the Azure Data Lake Storage Gen2 connector are 15 minutes for an incremental crawl and one week for a full crawl.

Step 8: Review connection

Follow the general setup instructions.

Limitations

A published connection for Azure Blob Storage cannot be reconfigured for Azure Data Lake Storage Gen2 source and the other way around. In such scenarios, it's recommended to configure a new connection.

Also, the size of the files needs to be 4 MB or less for it to be crawled. File types currently supported are:

  • Word (docx, .docm, .dotx, .dotm)
  • PowerPoint (.pptm, .pptx, .potm, .potx, .ppam, .ppsm, .ppsx)
  • Excel (.xlsx, .xlsm)
  • Legacy Office formats (.doc, .dot, etc.)
  • Text (.txt)
  • HTML
  • PDF

Binary files like images (.jpg, .bmp, etc.) are not supported. For example, if a .docx file contains only images, it might be skipped because it didn't return any content.