Index data from Azure Cosmos DB using SQL or MongoDB APIs


SQL API is generally available. MongoDB API support is currently in public preview under supplemental Terms of Use. Request access, and after access is enabled, use a preview REST API (2020-06-30-preview or later) to access your data. There is currently limited portal support, and no .NET SDK support.

This article shows you how to configure an Azure Cosmos DB indexer to extract content and make it searchable in Azure Cognitive Search. This workflow creates an Azure Cognitive Search index and loads it with existing text extracted from Azure Cosmos DB.

Because terminology can be confusing, it's worth noting that Azure Cosmos DB indexing and Azure Cognitive Search indexing are distinct operations, unique to each service. Before you start Azure Cognitive Search indexing, your Azure Cosmos DB database must already exist and contain data.

The Cosmos DB indexer in Azure Cognitive Search can crawl Azure Cosmos DB items accessed through the following protocols.


Only Cosmos DB collections with an indexing policy set to Consistent are supported by Azure Cognitive Search. Indexing collections with a Lazy indexing policy is not recommended and may result in missing data. Collections with indexing disabled are not supported.

Use the portal


The portal currently supports the SQL API and MongoDB API (preview).

The easiest method for indexing Azure Cosmos DB items is to use a wizard in the Azure portal. By sampling data and reading metadata on the container, the Import data wizard in Azure Cognitive Search can create a default index, map source fields to target index fields, and load the index in a single operation. Depending on the size and complexity of source data, you could have an operational full text search index in minutes.

We recommend using the same region or location for both Azure Cognitive Search and Azure Cosmos DB for lower latency and to avoid bandwidth charges.

Step 1 - Prepare source data

You should have a Cosmos DB account, an Azure Cosmos DB database mapped to the SQL API or MongoDB API (preview), and content in the database.

Make sure your Cosmos DB database contains data. The Import data wizard reads metadata and performs data sampling to infer an index schema, but it also loads data from Cosmos DB. If the data is missing, the wizard stops with this error "Error detecting index schema from data source: Could not build a prototype index because datasource 'emptycollection' returned no data".

Step 2 - Start Import data wizard

You can start the wizard from the command bar in the Azure Cognitive Search service page, or if you're connecting to Cosmos DB SQL API you can click Add Azure Cognitive Search in the Settings section of your Cosmos DB account's left navigation pane.

Screenshot of the Import data command

Step 3 - Set the data source

In the data source page, the source must be Cosmos DB, with the following specifications:

  • Name is the name of the data source object. Once created, you can choose it for other workloads.

  • Cosmos DB account should be in one of the following formats:

    1. The primary or secondary connection string from Cosmos DB with the following format: AccountEndpoint=https://<Cosmos DB account name>;AccountKey=<Cosmos DB auth key>;.
      • For version 3.2 and version 3.6 MongoDB collections use the following format for the Cosmos DB account in the Azure portal: AccountEndpoint=https://<Cosmos DB account name>;AccountKey=<Cosmos DB auth key>;ApiKind=MongoDb
    2. A managed identity connection string with the following format that does not include an account key: ResourceId=/subscriptions/<your subscription ID>/resourceGroups/<your resource group name>/providers/Microsoft.DocumentDB/databaseAccounts/<your cosmos db account name>/;(ApiKind=[api-kind];). To use this connection string format, follow the instructions for Setting up an indexer connection to a Cosmos DB database using a managed identity.
  • Database is an existing database from the account.

  • Collection is a container of documents. Documents must exist in order for import to succeed.

  • Query can be blank if you want all documents, otherwise you can input a query that selects a document subset. Query is only available for the SQL API.

    Cosmos DB data source definition

Step 4 - Skip the "Enrich content" page in the wizard

Adding cognitive skills (or enrichment) is not an import requirement. Unless you have a specific need to add AI enrichment to your indexing pipeline, you can skip this step.

To skip the step, click the blue buttons at the bottom of the page for "Next" and "Skip".

Step 5 - Set index attributes

In the Index page, you should see a list of fields with a data type and a series of checkboxes for setting index attributes. The wizard can generate a fields list based on metadata and by sampling the source data.

You can bulk-select attributes by clicking the checkbox at the top of an attribute column. Choose Retrievable and Searchable for every field that should be returned to a client app and subject to full text search processing. You'll notice that integers are not full text or fuzzy searchable (numbers are evaluated verbatim and are often useful in filters).

Review the description of index attributes and language analyzers for more information.

Take a moment to review your selections. Once you run the wizard, physical data structures are created and you won't be able to edit these fields without dropping and recreating all objects.

Cosmos DB index definition

Step 6 - Create indexer

Fully specified, the wizard creates three distinct objects in your search service. A data source object and index object are saved as named resources in your Azure Cognitive Search service. The last step creates an indexer object. Naming the indexer allows it to exist as a standalone resource, which you can schedule and manage independently of the index and data source object, created in the same wizard sequence.

If you are not familiar with indexers, an indexer is a resource in Azure Cognitive Search that crawls an external data source for searchable content. The output of the Import data wizard is an indexer that crawls your Cosmos DB data source, extracts searchable content, and imports it into an index on Azure Cognitive Search.

The following screenshot shows the default indexer configuration. You can switch to Once if you want to run the indexer one time. Click Submit to run the wizard and create all objects. Indexing commences immediately.

Cosmos DB indexer definition

You can monitor data import in the portal pages. Progress notifications indicate indexing status and how many documents are uploaded.

When indexing is complete, you can use Search explorer to query your index.


If you don't see the data you expect, you might need to set more attributes on more fields. Delete the index and indexer you just created, and step through the wizard again, modifying your selections for index attributes in step 5.


You can use the REST API to index Azure Cosmos DB data, following a three-part workflow common to all indexers in Azure Cognitive Search: create a data source, create an index, create an indexer. In the process below, data extraction from Cosmos DB starts when you submit the Create Indexer request.

Earlier in this article it is mentioned that Azure Cosmos DB indexing and Azure Cognitive Search indexing indexing are distinct operations. For Cosmos DB indexing, by default all documents are automatically indexed. If you turn off automatic indexing, documents can be accessed only through their self-links or by queries by using the document ID. Azure Cognitive Search indexing requires Cosmos DB automatic indexing to be turned on in the collection that will be indexed by Azure Cognitive Search.


Azure Cosmos DB is the next generation of DocumentDB. Previously with API version 2017-11-11 you could use the documentdb syntax. This meant that you could specify your data source type as cosmosdb or documentdb. Starting with API version 2019-05-06 both the Azure Cognitive Search APIs and Portal only support the cosmosdb syntax as instructed in this article. This means that the data source type must cosmosdb if you would like to connect to a Cosmos DB endpoint.

Step 1 - Assemble inputs for the request

For each request, you must provide the service name and admin key for Azure Cognitive Search (in the POST header), and the storage account name and key for blob storage. You can use Postman or Visual Studio Code to send HTTP requests to Azure Cognitive Search.

Copy the following three values for use with your request:

  • Azure Cognitive Search service name
  • Azure Cognitive Search admin key
  • Cosmos DB connection string

You can find these values in the portal:

  1. In the portal pages for Azure Cognitive Search, copy the search service URL from the Overview page.

  2. In the left navigation pane, click Keys and then copy either the primary or secondary key.

  3. Switch to the portal pages for your Cosmos storage account. In the left navigation pane, under Settings, click Keys. This page provides a URI, two sets of connection strings, and two sets of keys. Copy one of the connection strings to Notepad.

Step 2 - Create a data source

A data source specifies the data to index, credentials, and policies for identifying changes in the data (such as modified or deleted documents inside your collection). The data source is defined as an independent resource so that it can be used by multiple indexers.

To create a data source, formulate a POST request:

    POST https://[service name]
    Content-Type: application/json
    api-key: [Search service admin key]

        "name": "mycosmosdbdatasource",
        "type": "cosmosdb",
        "credentials": {
            "connectionString": "AccountEndpoint=;AccountKey=myCosmosDbAuthKey;Database=myCosmosDbDatabaseId"
        "container": { "name": "myCollection", "query": null },
        "dataChangeDetectionPolicy": {
            "@odata.type": "#Microsoft.Azure.Search.HighWaterMarkChangeDetectionPolicy",
            "highWaterMarkColumnName": "_ts"

The body of the request contains the data source definition, which should include the following fields:

Field Description
name Required. Choose any name to represent your data source object.
type Required. Must be cosmosdb.
credentials Required. Must be a Cosmos DB connection string.

For SQL collections, connection strings are in this format: AccountEndpoint=https://<Cosmos DB account name>;AccountKey=<Cosmos DB auth key>;Database=<Cosmos DB database id>

For version 3.2 and version 3.6 MongoDB collections use the following format for the connection string: AccountEndpoint=https://<Cosmos DB account name>;AccountKey=<Cosmos DB auth key>;Database=<Cosmos DB database id>;ApiKind=MongoDb

Avoid port numbers in the endpoint url. If you include the port number, Azure Cognitive Search will be unable to index your Azure Cosmos DB database.
container Contains the following elements:
name: Required. Specify the ID of the database collection to be indexed.
query: Optional. You can specify a query to flatten an arbitrary JSON document into a flat schema that Azure Cognitive Search can index.
For the MongoDB API, queries are not supported.
dataChangeDetectionPolicy Recommended. See Indexing Changed Documents section.
dataDeletionDetectionPolicy Optional. See Indexing Deleted Documents section.

Using queries to shape indexed data

You can specify a SQL query to flatten nested properties or arrays, project JSON properties, and filter the data to be indexed.


Custom queries are not supported for MongoDB API: container.query parameter must be set to null or omitted.

Example document:

        "userId": 10001,
        "contact": {
            "firstName": "andy",
            "lastName": "hoh"
        "company": "microsoft",
        "tags": ["azure", "cosmosdb", "search"]

Filter query:

SELECT * FROM c WHERE = "microsoft" and c._ts >= @HighWaterMark ORDER BY c._ts

Flattening query:

SELECT, c.userId,,,, c._ts FROM c WHERE c._ts >= @HighWaterMark ORDER BY c._ts

Projection query:

SELECT VALUE { "id", "Name", "Company", "_ts":c._ts } FROM c WHERE c._ts >= @HighWaterMark ORDER BY c._ts

Array flattening query:

SELECT, c.userId, tag, c._ts FROM c JOIN tag IN c.tags WHERE c._ts >= @HighWaterMark ORDER BY c._ts


Queries using the DISTINCT keyword or GROUP BY clause are not supported. Azure Cognitive Search relies on SQL query pagination to fully enumerate the results of the query. Neither the DISTINCT keyword or GROUP BY clause are compatible with the continuation tokens used to paginate results.

Examples of unsupported queries:

SELECT DISTINCT, c.userId, c._ts FROM c WHERE c._ts >= @HighWaterMark ORDER BY c._ts


SELECT TOP 4 COUNT(1) AS foodGroupCount, f.foodGroup FROM Food f GROUP BY f.foodGroup

Although Cosmos DB has a workaround to support SQL query pagination with the DISTINCT keyword by using the ORDER BY clause, it is not compatible with Azure Cognitive Search. The query will return a single JSON value, whereas Azure Cognitive Search expects a JSON object.

-- The following query returns a single JSON value and isn't supported by Azure Cognitive Search

Step 3 - Create a target search index

Create a target Azure Cognitive Search index if you don’t have one already. The following example creates an index with an ID and description field:

    POST https://[service name]
    Content-Type: application/json
    api-key: [Search service admin key]

       "name": "mysearchindex",
       "fields": [{
         "name": "id",
         "type": "Edm.String",
         "key": true,
         "searchable": false
       }, {
         "name": "description",
         "type": "Edm.String",
         "filterable": false,
         "searchable": true,
         "sortable": false,
         "facetable": false,
         "suggestions": true

Ensure that the schema of your target index is compatible with the schema of the source JSON documents or the output of your custom query projection.


For partitioned collections, the default document key is Azure Cosmos DB's _rid property, which Azure Cognitive Search automatically renames to rid because field names cannot start with an underscore character. Also, Azure Cosmos DB _rid values contain characters that are invalid in Azure Cognitive Search keys. For this reason, the _rid values are Base64 encoded.

For MongoDB collections, Azure Cognitive Search automatically renames the _id property to id.

Mapping between JSON Data Types and Azure Cognitive Search Data Types

JSON data type Compatible target index field types
Bool Edm.Boolean, Edm.String
Numbers that look like integers Edm.Int32, Edm.Int64, Edm.String
Numbers that look like floating-points Edm.Double, Edm.String
String Edm.String
Arrays of primitive types, for example ["a", "b", "c"] Collection(Edm.String)
Strings that look like dates Edm.DateTimeOffset, Edm.String
GeoJSON objects, for example { "type": "Point", "coordinates": [long, lat] } Edm.GeographyPoint
Other JSON objects N/A

Step 4 - Configure and run the indexer

Once the index and data source have been created, you're ready to create the indexer:

    POST https://[service name]
    Content-Type: application/json
    api-key: [admin key]

      "name" : "mycosmosdbindexer",
      "dataSourceName" : "mycosmosdbdatasource",
      "targetIndexName" : "mysearchindex",
      "schedule" : { "interval" : "PT2H" }

This indexer runs every two hours (schedule interval is set to "PT2H"). To run an indexer every 30 minutes, set the interval to "PT30M". The shortest supported interval is 5 minutes. The schedule is optional - if omitted, an indexer runs only once when it's created. However, you can run an indexer on-demand at any time.

For more details on the Create Indexer API, check out Create Indexer.

For more information about defining indexer schedules, see How to schedule indexers for Azure Cognitive Search.

Use .NET

The generally available .NET SDK has full parity with the generally available REST API. We recommend that you review the previous REST API section to learn concepts, workflow, and requirements. You can then refer to following .NET API reference documentation to implement a JSON indexer in managed code.

Indexing changed documents

The purpose of a data change detection policy is to efficiently identify changed data items. Currently, the only supported policy is the HighWaterMarkChangeDetectionPolicy using the _ts (timestamp) property provided by Azure Cosmos DB, which is specified as follows:

        "@odata.type" : "#Microsoft.Azure.Search.HighWaterMarkChangeDetectionPolicy",
        "highWaterMarkColumnName" : "_ts"

Using this policy is highly recommended to ensure good indexer performance.

If you are using a custom query, make sure that the _ts property is projected by the query.

Incremental progress and custom queries

Incremental progress during indexing ensures that if indexer execution is interrupted by transient failures or execution time limit, the indexer can pick up where it left off next time it runs, instead of having to reindex the entire collection from scratch. This is especially important when indexing large collections.

To enable incremental progress when using a custom query, ensure that your query orders the results by the _ts column. This enables periodic check-pointing that Azure Cognitive Search uses to provide incremental progress in the presence of failures.

In some cases, even if your query contains an ORDER BY [collection alias]._ts clause, Azure Cognitive Search may not infer that the query is ordered by the _ts. You can tell Azure Cognitive Search that results are ordered by using the assumeOrderByHighWaterMarkColumn configuration property. To specify this hint, create or update your indexer as follows:

     ... other indexer definition properties
     "parameters" : {
            "configuration" : { "assumeOrderByHighWaterMarkColumn" : true } }

Indexing deleted documents

When rows are deleted from the collection, you normally want to delete those rows from the search index as well. The purpose of a data deletion detection policy is to efficiently identify deleted data items. Currently, the only supported policy is the Soft Delete policy (deletion is marked with a flag of some sort), which is specified as follows:

        "@odata.type" : "#Microsoft.Azure.Search.SoftDeleteColumnDeletionDetectionPolicy",
        "softDeleteColumnName" : "the property that specifies whether a document was deleted",
        "softDeleteMarkerValue" : "the value that identifies a document as deleted"

If you are using a custom query, make sure that the property referenced by softDeleteColumnName is projected by the query.

The following example creates a data source with a soft-deletion policy:

	POST https://[service name]
    Content-Type: application/json
    api-key: [Search service admin key]

        "name": "mycosmosdbdatasource",
        "type": "cosmosdb",
        "credentials": {
            "connectionString": "AccountEndpoint=;AccountKey=myCosmosDbAuthKey;Database=myCosmosDbDatabaseId"
        "container": { "name": "myCosmosDbCollectionId" },
        "dataChangeDetectionPolicy": {
            "@odata.type": "#Microsoft.Azure.Search.HighWaterMarkChangeDetectionPolicy",
            "highWaterMarkColumnName": "_ts"
        "dataDeletionDetectionPolicy": {
            "@odata.type": "#Microsoft.Azure.Search.SoftDeleteColumnDeletionDetectionPolicy",
            "softDeleteColumnName": "isDeleted",
            "softDeleteMarkerValue": "true"

Next steps

Congratulations! You have learned how to integrate Azure Cosmos DB with Azure Cognitive Search using an indexer.