您现在访问的是微软AZURE全球版技术文档网站,若需要访问由世纪互联运营的MICROSOFT AZURE中国区技术文档网站,请访问 https://docs.azure.cn.

如何在 Azure 中使用索引器对 Cosmos DB 数据编制索引认知搜索How to index Cosmos DB data using an indexer in Azure Cognitive Search

重要

SQL API 已正式发布。SQL API is generally available. MongoDB API、Gremlin API 和 Cassandra API 支持目前为公共预览版。MongoDB API, Gremlin API, and Cassandra API support are currently in public preview. 提供的预览版功能不附带服务级别协议,我们不建议将其用于生产工作负荷。Preview functionality is provided without a service level agreement, and is not recommended for production workloads. 有关详细信息,请参阅 Microsoft Azure 预览版补充使用条款For more information, see Supplemental Terms of Use for Microsoft Azure Previews. 你可以通过填写此表单来请求对预览的访问权限。You can request access to the previews by filling out this form. REST API 版本 2019-05-06-Preview 提供预览版功能。The REST API version 2019-05-06-Preview provides preview features. 目前提供有限的门户支持,不提供 .NET SDK 支持。There is currently limited portal support, and no .NET SDK support.

警告

Azure 认知搜索仅支持将索引策略设置为一致的 Cosmos DB 集合。Only Cosmos DB collections with an indexing policy set to Consistent are supported by Azure Cognitive Search. 不建议使用延迟索引策略对集合编制索引,这可能会导致数据丢失。Indexing collections with a Lazy indexing policy is not recommended and may result in missing data. 不支持禁用索引的集合。Collections with indexing disabled are not supported.

本文介绍如何配置 Azure Cosmos DB索引器来提取内容并使其在 Azure 认知搜索中可搜索。This article shows you how to configure an Azure Cosmos DB indexer to extract content and make it searchable in Azure Cognitive Search. 此工作流创建 Azure 认知搜索索引,并使用从 Azure Cosmos DB 中提取的现有文本加载该索引。This workflow creates an Azure Cognitive Search index and loads it with existing text extracted from Azure Cosmos DB.

由于术语可能令人困惑,值得注意的是, Azure Cosmos DB 索引Azure 认知搜索索引是不同的操作,每个服务都是唯一的。Because terminology can be confusing, it's worth noting that Azure Cosmos DB indexing and Azure Cognitive Search indexing are distinct operations, unique to each service. 开始 Azure 认知搜索编制索引之前,Azure Cosmos DB 数据库必须已存在并且包含数据。Before you start Azure Cognitive Search indexing, your Azure Cosmos DB database must already exist and contain data.

Azure 认知搜索中的 Cosmos DB 索引器可对通过不同协议访问Azure Cosmos DB 项进行爬网。The Cosmos DB indexer in Azure Cognitive Search can crawl Azure Cosmos DB items accessed through different protocols.

备注

如果要在 Azure 认知搜索中查看受支持的表 API ,则可以为该转换用户语音的投票。You can cast a vote on User Voice for the Table API if you'd like to see it supported in Azure Cognitive Search.

使用门户Use the portal

备注

门户当前支持 SQL API 和 MongoDB API (预览)。The portal currently supports the SQL API and MongoDB API (preview).

索引 Azure Cosmos DB 项的最简单方法是使用Azure 门户中的向导。The easiest method for indexing Azure Cosmos DB items is to use a wizard in the Azure portal. 通过对容器中的数据进行采样和读取元数据,Azure 认知搜索中的 "导入数据" 向导可以创建一个默认索引,将源字段映射到目标索引字段,并在单个操作中加载索引。By sampling data and reading metadata on the container, the Import data wizard in Azure Cognitive Search can create a default index, map source fields to target index fields, and load the index in a single operation. 根据源数据的大小和复杂性,在数分钟内就能创建一个有效的全文搜索索引。Depending on the size and complexity of source data, you could have an operational full text search index in minutes.

建议为 Azure 认知搜索和 Azure Cosmos DB 使用同一区域或位置,以降低延迟并避免带宽费用。We recommend using the same region or location for both Azure Cognitive Search and Azure Cosmos DB for lower latency and to avoid bandwidth charges.

1 - 准备源数据1 - Prepare source data

应该有一个 Cosmos DB 帐户、一个映射到 SQL API 的 Azure Cosmos DB 数据库、MongoDB API (预览)或 Gremlin API (预览),以及数据库中的内容。You should have a Cosmos DB account, an Azure Cosmos DB database mapped to the SQL API, MongoDB API (preview), or Gremlin API (preview), and content in the database.

请确保 Cosmos DB 数据库包含数据。Make sure your Cosmos DB database contains data. 导入数据向导可读取元数据并执行数据采样,以推断索引架构,但它也会从 Cosmos DB 加载数据。The Import data wizard reads metadata and performs data sampling to infer an index schema, but it also loads data from Cosmos DB. 如果缺少数据,向导将停止并出现以下错误: "从数据源检测索引架构:错误:无法生成原型索引,因为数据源 ' emptycollection ' 未返回数据"。If the data is missing, the wizard stops with this error "Error detecting index schema from data source: Could not build a prototype index because datasource 'emptycollection' returned no data".

2 - 启动“导入数据”向导2 - Start Import data wizard

你可以从 "Azure 认知搜索服务" 页的命令栏中启动该向导,或者,如果要连接到 COSMOS DB SQL API,你可以在 Cosmos DB 帐户的左侧导航窗格的 "设置" 部分中单击 "添加 Azure 认知搜索"。You can start the wizard from the command bar in the Azure Cognitive Search service page, or if you're connecting to Cosmos DB SQL API you can click Add Azure Cognitive Search in the Settings section of your Cosmos DB account's left navigation pane.

门户中的 "导入数据" 命令Import data command in portal

3 - 设置数据源3 - Set the data source

在 "数据源" 页中,源必须Cosmos DB,并具有以下规范:In the data source page, the source must be Cosmos DB, with the following specifications:

  • Name是数据源对象的名称。Name is the name of the data source object. 创建后,您可以为其他工作负荷选择该方法。Once created, you can choose it for other workloads.

  • Cosmos DB 帐户应为来自 Cosmos DB 的主或辅助连接字符串,AccountEndpointAccountKeyCosmos DB account should be the primary or secondary connection string from Cosmos DB, with an AccountEndpoint and an AccountKey. 对于 MongoDB 集合,请将 /apikind/= MongoDB添加到连接字符串的末尾,并使用分号将其与连接字符串隔开。For MongoDB collections, add ApiKind=MongoDb to the end of the connection string and separate it from the connection string with a semicolon. 对于 Gremlin API 和 Cassandra API,请使用REST API的说明。For the Gremlin API and Cassandra API, use the instructions for the REST API.

  • 数据库是该帐户的现有数据库。Database is an existing database from the account.

  • 集合是文档的容器。Collection is a container of documents. 文档必须存在才能成功导入。Documents must exist in order for import to succeed.

  • 如果您想要所有文档,则查询可以为空,否则您可以输入一个选择文档子集的查询。Query can be blank if you want all documents, otherwise you can input a query that selects a document subset. 查询仅可用于 SQL API。Query is only available for the SQL API.

    Cosmos DB 数据源定义Cosmos DB data source definition

4-跳过向导中的 "充实内容" 页面4 - Skip the "Enrich content" page in the wizard

添加认知技能(或扩充)不是一种导入需求。Adding cognitive skills (or enrichment) is not an import requirement. 除非有特定需要将 AI 扩充添加到索引管道,否则应跳过此步骤。Unless you have a specific need to add AI enrichment to your indexing pipeline, you should skip this step.

若要跳过该步骤,请单击页面底部的蓝色按钮 "下一步" 和 "跳过"。To skip the step, click the blue buttons at the bottom of the page for "Next" and "Skip".

5 - 设置索引属性5 - Set index attributes

在“索引”页中,应会看到带有数据类型的字段列表,以及一系列用于设置索引属性的复选框。In the Index page, you should see a list of fields with a data type and a series of checkboxes for setting index attributes. 此向导可以基于元数据生成字段列表,并通过采样源数据。The wizard can generate a fields list based on metadata and by sampling the source data.

通过单击属性列顶部的复选框,可以批量选择属性。You can bulk-select attributes by clicking the checkbox at the top of an attribute column. 对于应返回到客户端应用并受全文搜索处理的每个字段,选择 "可检索和可搜索"。Choose Retrievable and Searchable for every field that should be returned to a client app and subject to full text search processing. 您会注意到整数不是全文或模糊搜索(数字按原义计算,在筛选器中通常很有用)。You'll notice that integers are not full text or fuzzy searchable (numbers are evaluated verbatim and are often useful in filters).

有关详细信息,请查看索引属性语言分析器的说明。Review the description of index attributes and language analyzers for more information.

花费片刻时间来检查所做的选择。Take a moment to review your selections. 运行向导后,将创建物理数据结构,到时,除非删除再重新创建所有对象,否则无法编辑这些字段。Once you run the wizard, physical data structures are created and you won't be able to edit these fields without dropping and recreating all objects.

Cosmos DB 索引定义Cosmos DB index definition

6 - 创建索引器6 - Create indexer

完全指定设置后,向导将在搜索服务中创建三个不同的对象。Fully specified, the wizard creates three distinct objects in your search service. 数据源对象和索引对象保存为 Azure 认知搜索服务中的已命名资源。A data source object and index object are saved as named resources in your Azure Cognitive Search service. 最后一个步骤创建索引器对象。The last step creates an indexer object. 为索引器命名可让它作为独立的资源存在,无论在同一向导序列中创建了哪种索引和数据源对象,都可以计划和管理该索引器。Naming the indexer allows it to exist as a standalone resource, which you can schedule and manage independently of the index and data source object, created in the same wizard sequence.

如果你不熟悉索引器,索引器是 Azure 认知搜索中的资源,用于对外部数据源进行爬网搜索。If you are not familiar with indexers, an indexer is a resource in Azure Cognitive Search that crawls an external data source for searchable content. 导入数据向导的输出是一个索引器,该索引器对您的 Cosmos DB 数据源进行爬网,提取可搜索的内容,然后将其导入 Azure 认知搜索的索引。The output of the Import data wizard is an indexer that crawls your Cosmos DB data source, extracts searchable content, and imports it into an index on Azure Cognitive Search.

以下屏幕截图显示默认索引器配置。The following screenshot shows the default indexer configuration. 如果希望一次运行索引器,则可以切换到 "一次"。You can switch to Once if you want to run the indexer one time. 单击 "提交" 以运行该向导并创建所有对象。Click Submit to run the wizard and create all objects. 随后会立即开始编制索引。Indexing commences immediately.

Cosmos DB 索引器定义Cosmos DB indexer definition

可以在门户页监视数据导入。You can monitor data import in the portal pages. 进度通知指示索引状态以及已上传的文档数。Progress notifications indicate indexing status and how many documents are uploaded.

索引编制完成后,可以使用搜索浏览器来查询索引。When indexing is complete, you can use Search explorer to query your index.

备注

如果看不到预期的数据,则可能需要对更多字段设置更多属性。If you don't see the data you expect, you might need to set more attributes on more fields. 删除你刚创建的索引和索引器,并再次逐句通过向导,同时修改步骤5中索引属性所做的选择。Delete the index and indexer you just created, and step through the wizard again, modifying your selections for index attributes in step 5.

使用 REST APIUse REST APIs

可以使用 REST API 来索引 Azure Cosmos DB 数据,并遵循 Azure 认知搜索中所有索引器共有的三部分工作流:创建数据源、创建索引、创建索引器。You can use the REST API to index Azure Cosmos DB data, following a three-part workflow common to all indexers in Azure Cognitive Search: create a data source, create an index, create an indexer. 提交 Create 索引器请求时,将发生从 Cosmos DB 的数据提取。Data extraction from Cosmos DB occurs when you submit the Create Indexer request. 完成此请求后,将具有可查询的索引。After this request is finished, you will have a queryable index.

备注

若要为来自 Cosmos DB Gremlin API 或 Cosmos DB Cassandra API 的数据编制索引,必须先通过填写此表单请求访问封闭预览。For indexing data from Cosmos DB Gremlin API or Cosmos DB Cassandra API you must first request access to the gated previews by filling out this form. 处理请求后,将收到有关如何使用REST API 版本 2019-05-06-Preview来创建数据源的说明。Once your request is processed, you will receive instructions for how to use the REST API version 2019-05-06-Preview to create the data source.

本文前面介绍Azure Cosmos DB 索引编制和Azure 认知搜索索引编制索引是不同的操作。Earlier in this article it is mentioned that Azure Cosmos DB indexing and Azure Cognitive Search indexing indexing are distinct operations. 对于 Cosmos DB 索引,默认情况下会自动为所有文档编制索引,除非 Cassandra API。For Cosmos DB indexing, by default all documents are automatically indexed except with the Cassandra API. 如果关闭自动索引编制,则只能通过其自链接或通过使用文档 ID 进行查询来访问文档。If you turn off automatic indexing, documents can be accessed only through their self-links or by queries by using the document ID. Azure 认知搜索索引需要在将由 Azure 认知搜索编制索引的集合中打开 Cosmos DB 自动索引。Azure Cognitive Search indexing requires Cosmos DB automatic indexing to be turned on in the collection that will be indexed by Azure Cognitive Search. 注册 Cosmos DB Cassandra API 索引器预览时,将向你提供有关如何设置 Cosmos DB 索引的说明。When signing up for the Cosmos DB Cassandra API indexer preview, you'll be given instructions on how set up Cosmos DB indexing.

警告

Azure Cosmos DB 是下一代 DocumentDB。Azure Cosmos DB is the next generation of DocumentDB. 以前使用 API 版本2017-11-11 ,你可以使用 documentdb 语法。Previously with API version 2017-11-11 you could use the documentdb syntax. 这意味着您可以将数据源类型指定为 cosmosdbdocumentdbThis meant that you could specify your data source type as cosmosdb or documentdb. 从 API 版本2019-05-06开始,Azure 认知搜索 Api 和门户仅支持本文中所述的 cosmosdb 语法。Starting with API version 2019-05-06 both the Azure Cognitive Search APIs and Portal only support the cosmosdb syntax as instructed in this article. 这意味着,如果想要连接到 Cosmos DB 终结点,数据源类型必须 cosmosdbThis means that the data source type must cosmosdb if you would like to connect to a Cosmos DB endpoint.

1-汇集请求的输入1 - Assemble inputs for the request

对于每个请求,必须提供 Azure 认知搜索的服务名称和管理密钥(在 POST 标头中)以及 blob 存储的存储帐户名称和密钥。For each request, you must provide the service name and admin key for Azure Cognitive Search (in the POST header), and the storage account name and key for blob storage. 可以使用Postman将 HTTP 请求发送到 Azure 认知搜索。You can use Postman to send HTTP requests to Azure Cognitive Search.

将以下四个值复制到记事本中,以便可以将它们粘贴到请求中:Copy the following four values into Notepad so that you can paste them into a request:

  • Azure 认知搜索服务名称Azure Cognitive Search service name
  • Azure 认知搜索管理密钥Azure Cognitive Search admin key
  • Cosmos DB 连接字符串Cosmos DB connection string

可以在门户中找到以下值:You can find these values in the portal:

  1. 在 Azure 认知搜索的门户页中,从 "概述" 页复制 "搜索服务 URL"。In the portal pages for Azure Cognitive Search, copy the search service URL from the Overview page.

  2. 在左侧导航窗格中,单击 "密钥",然后复制 "主密钥" 或 "辅助密钥" (它们等效)。In the left navigation pane, click Keys and then copy either the primary or secondary key (they are equivalent).

  3. 切换到 Cosmos 存储帐户的门户页面。Switch to the portal pages for your Cosmos storage account. 在左侧导航窗格中的 "设置" 下,单击 "密钥"。In the left navigation pane, under Settings, click Keys. 此页提供 URI、两组连接字符串和两组键。This page provides a URI, two sets of connection strings, and two sets of keys. 将其中一个连接字符串复制到记事本。Copy one of the connection strings to Notepad.

2-创建数据源2 - Create a data source

数据源指定要编制索引的数据、凭据和用于识别数据更改(如修改或删除了集合内的文档)的策略。A data source specifies the data to index, credentials, and policies for identifying changes in the data (such as modified or deleted documents inside your collection). 数据源定义为独立的资源,以便它可以被多个索引器使用。The data source is defined as an independent resource so that it can be used by multiple indexers.

若要创建数据源,请构建 POST 请求:To create a data source, formulate a POST request:

POST https://[service name].search.windows.net/datasources?api-version=2019-05-06
Content-Type: application/json
api-key: [Search service admin key]

{
    "name": "mycosmosdbdatasource",
    "type": "cosmosdb",
    "credentials": {
        "connectionString": "AccountEndpoint=https://myCosmosDbEndpoint.documents.azure.com;AccountKey=myCosmosDbAuthKey;Database=myCosmosDbDatabaseId"
    },
    "container": { "name": "myCollection", "query": null },
    "dataChangeDetectionPolicy": {
        "@odata.type": "#Microsoft.Azure.Search.HighWaterMarkChangeDetectionPolicy",
        "highWaterMarkColumnName": "_ts"
    }
}

请求正文包含数据源定义,其中应包括以下字段:The body of the request contains the data source definition, which should include the following fields:

字段Field DescriptionDescription
namename 必需。Required. 选择任意名称来表示数据源对象。Choose any name to represent your data source object.
typetype 必需。Required. 必须是 cosmosdbMust be cosmosdb.
凭据credentials 必需。Required. 必须是 Cosmos DB 的连接字符串。Must be a Cosmos DB connection string.
对于 SQL 集合,连接字符串采用以下格式: AccountEndpoint=<Cosmos DB endpoint url>;AccountKey=<Cosmos DB auth key>;Database=<Cosmos DB database id>For SQL collections, connection strings are in this format: AccountEndpoint=<Cosmos DB endpoint url>;AccountKey=<Cosmos DB auth key>;Database=<Cosmos DB database id>

对于 MongoDB 集合,请将 /apikind/= MongoDB添加到连接字符串:For MongoDB collections, add ApiKind=MongoDb to the connection string:
AccountEndpoint=<Cosmos DB endpoint url>;AccountKey=<Cosmos DB auth key>;Database=<Cosmos DB database id>;ApiKind=MongoDb

对于 Gremlin 关系图和 Cassandra 表,注册 "封闭索引器预览" 以获取对预览版的访问权限,以及有关如何设置凭据格式的信息。For Gremlin graphs and Cassandra tables, sign up for the gated indexer preview to get access to the preview and information about how to format the credentials.

避免在终结点 URL 中包含端口号。Avoid port numbers in the endpoint url. 如果包含端口号,Azure 认知搜索将无法为你的 Azure Cosmos DB 数据库编制索引。If you include the port number, Azure Cognitive Search will be unable to index your Azure Cosmos DB database.
容器container 包含下列元素:Contains the following elements:
名称:必需。name: Required. 指定要编制索引的数据库集合的 ID。Specify the ID of the database collection to be indexed.
查询:可选。query: Optional. 可以指定查询,将任意 JSON 文档平展成 Azure 认知搜索可以为其编制索引的平面架构。You can specify a query to flatten an arbitrary JSON document into a flat schema that Azure Cognitive Search can index.
对于 MongoDB API、Gremlin API 和 Cassandra API,不支持查询。For the MongoDB API, Gremlin API, and Cassandra API, queries are not supported.
dataChangeDetectionPolicydataChangeDetectionPolicy 推荐。Recommended. 请参阅为已更改的文档编制索引部分。See Indexing Changed Documents section.
dataDeletionDetectionPolicydataDeletionDetectionPolicy 可选。Optional. 请参阅为已删除的文档编制索引部分。See Indexing Deleted Documents section.

使用查询形成索引数据Using queries to shape indexed data

可以指定一个 SQL 查询来平展嵌套的属性或数组、投影 JSON 属性并筛选要编制索引的数据。You can specify a SQL query to flatten nested properties or arrays, project JSON properties, and filter the data to be indexed.

警告

MONGODB apiGremlin apiCassandra API不支持自定义查询: container.query 参数必须设置为 null 或省略。Custom queries are not supported for MongoDB API, Gremlin API, and Cassandra API: container.query parameter must be set to null or omitted. 如果需要使用自定义查询,请在用户之声上告知我们。If you need to use a custom query, please let us know on User Voice.

示例文档:Example document:

{
    "userId": 10001,
    "contact": {
        "firstName": "andy",
        "lastName": "hoh"
    },
    "company": "microsoft",
    "tags": ["azure", "cosmosdb", "search"]
}

筛选查询:Filter query:

SELECT * FROM c WHERE c.company = "microsoft" and c._ts >= @HighWaterMark ORDER BY c._ts

平展查询:Flattening query:

SELECT c.id, c.userId, c.contact.firstName, c.contact.lastName, c.company, c._ts FROM c WHERE c._ts >= @HighWaterMark ORDER BY c._ts

投影查询:Projection query:

SELECT VALUE { "id":c.id, "Name":c.contact.firstName, "Company":c.company, "_ts":c._ts } FROM c WHERE c._ts >= @HighWaterMark ORDER BY c._ts

数组平展查询:Array flattening query:

SELECT c.id, c.userId, tag, c._ts FROM c JOIN tag IN c.tags WHERE c._ts >= @HighWaterMark ORDER BY c._ts

3-创建目标搜索索引3 - Create a target search index

如果还没有目标 Azure 认知搜索索引,请创建一个。Create a target Azure Cognitive Search index if you don’t have one already. 下面的示例创建具有 ID 和说明字段的索引:The following example creates an index with an ID and description field:

POST https://[service name].search.windows.net/indexes?api-version=2019-05-06
Content-Type: application/json
api-key: [Search service admin key]

{
   "name": "mysearchindex",
   "fields": [{
     "name": "id",
     "type": "Edm.String",
     "key": true,
     "searchable": false
   }, {
     "name": "description",
     "type": "Edm.String",
     "filterable": false,
     "sortable": false,
     "facetable": false,
     "suggestions": true
   }]
 }

确保目标索引的架构与源 JSON 文档的架构或自定义查询投影的输出的架构兼容。Ensure that the schema of your target index is compatible with the schema of the source JSON documents or the output of your custom query projection.

备注

对于分区集合,默认文档键是 Azure Cosmos DB 的 _rid 属性,Azure 认知搜索会自动将其重命名为 rid,因为字段名称不能以下划线字符开头。For partitioned collections, the default document key is Azure Cosmos DB's _rid property, which Azure Cognitive Search automatically renames to rid because field names cannot start with an underscore character. 此外,Azure Cosmos DB _rid 值包含在 Azure 认知搜索密钥中无效的字符。Also, Azure Cosmos DB _rid values contain characters that are invalid in Azure Cognitive Search keys. 因此,_rid 值采用 Base64 编码。For this reason, the _rid values are Base64 encoded.

对于 MongoDB 集合,Azure 认知搜索会自动将 _id 属性重命名为 "id"。For MongoDB collections, Azure Cognitive Search automatically renames the _id property to id.

JSON 数据类型与 Azure 认知搜索数据类型之间的映射Mapping between JSON Data Types and Azure Cognitive Search Data Types

JSON 数据类型JSON data type 兼容的目标索引字段类型Compatible target index field types
BoolBool Edm.Boolean、Edm.StringEdm.Boolean, Edm.String
类似于整数的数字Numbers that look like integers Edm.Int32、Edm.Int64、Edm.StringEdm.Int32, Edm.Int64, Edm.String
类似于浮点的数字Numbers that look like floating-points Edm.Double、Edm.StringEdm.Double, Edm.String
StringString Edm.StringEdm.String
基元类型的数组,如 ["a", "b", "c"]Arrays of primitive types, for example ["a", "b", "c"] 集合 (Edm.String)Collection(Edm.String)
类似于日期的字符串Strings that look like dates Edm.DateTimeOffset、Edm.StringEdm.DateTimeOffset, Edm.String
GeoJSON 对象,如 { "type": "Point", "coordinates": [long, lat] }GeoJSON objects, for example { "type": "Point", "coordinates": [long, lat] } Edm.GeographyPointEdm.GeographyPoint
其他 JSON 对象Other JSON objects N/AN/A

4-配置并运行索引器4 - Configure and run the indexer

创建索引和数据源后,就可以准备创建索引器了:Once the index and data source have been created, you're ready to create the indexer:

POST https://[service name].search.windows.net/indexers?api-version=2019-05-06
Content-Type: application/json
api-key: [admin key]

{
  "name" : "mycosmosdbindexer",
  "dataSourceName" : "mycosmosdbdatasource",
  "targetIndexName" : "mysearchindex",
  "schedule" : { "interval" : "PT2H" }
}

此索引器每两小时运行一次(已将计划间隔设置为“PT2H”)。This indexer runs every two hours (schedule interval is set to "PT2H"). 若要每隔 30 分钟运行一次索引器,可将间隔设置为“PT30M”。To run an indexer every 30 minutes, set the interval to "PT30M". 支持的最短间隔为 5 分钟。The shortest supported interval is 5 minutes. 计划是可选的 - 如果省略,则索引器在创建后只运行一次。The schedule is optional - if omitted, an indexer runs only once when it's created. 但是,可以随时根据需要运行索引器。However, you can run an indexer on-demand at any time.

有关创建索引器 API 的更多详细信息,请参阅创建索引器For more details on the Create Indexer API, check out Create Indexer.

有关定义索引器计划的详细信息,请参阅如何为 Azure 认知搜索计划索引器For more information about defining indexer schedules, see How to schedule indexers for Azure Cognitive Search.

使用 .NETUse .NET

已公开发布的 .NET SDK 与公开可用的 REST API 完全相同。The generally available .NET SDK has full parity with the generally available REST API. 我们建议查看前面的 REST API 部分,以了解相关概念、工作流和要求。We recommend that you review the previous REST API section to learn concepts, workflow, and requirements. 然后,可以参阅以下 .NET API 参考文档,在托管代码中实现 JSON 索引器。You can then refer to following .NET API reference documentation to implement a JSON indexer in managed code.

为已更改的文档编制索引Indexing changed documents

数据更改检测策略旨在有效识别已更改的数据项。The purpose of a data change detection policy is to efficiently identify changed data items. 目前,唯一受支持的策略是使用 Azure Cosmos DB 提供的 _ts (时间戳)属性的HighWaterMarkChangeDetectionPolicy ,如下所示:Currently, the only supported policy is the HighWaterMarkChangeDetectionPolicy using the _ts (timestamp) property provided by Azure Cosmos DB, which is specified as follows:

{
    "@odata.type" : "#Microsoft.Azure.Search.HighWaterMarkChangeDetectionPolicy",
    "highWaterMarkColumnName" : "_ts"
}

强烈建议使用此策略,以确保索引器性能良好。Using this policy is highly recommended to ensure good indexer performance.

如果使用自定义查询,请确保查询投影 _ts 属性。If you are using a custom query, make sure that the _ts property is projected by the query.

增量操作和自定义查询Incremental progress and custom queries

索引编制过程中的增量操作可确保由于暂时性故障或执行时间限制而中断索引器执行时,索引器能够在下次运行时从中断位置运行,而不是从头开始重新为整个集合编制索引。Incremental progress during indexing ensures that if indexer execution is interrupted by transient failures or execution time limit, the indexer can pick up where it left off next time it runs, instead of having to reindex the entire collection from scratch. 在为大型集合编制索引时,这一点尤其重要。This is especially important when indexing large collections.

要在使用自定义查询时启用增量操作,请确保查询按照 _ts 列对结果进行排序。To enable incremental progress when using a custom query, ensure that your query orders the results by the _ts column. 这会启用定期检查,Azure 认知搜索使用此功能在出现故障时提供增量进度。This enables periodic check-pointing that Azure Cognitive Search uses to provide incremental progress in the presence of failures.

在某些情况下,即使您的查询中包含 ORDER BY [collection alias]._ts 子句,Azure 认知搜索也可能不会推断该查询按 _ts排序。In some cases, even if your query contains an ORDER BY [collection alias]._ts clause, Azure Cognitive Search may not infer that the query is ordered by the _ts. 可以通过使用 assumeOrderByHighWaterMarkColumn 配置属性,告知 Azure 认知搜索对结果进行排序。You can tell Azure Cognitive Search that results are ordered by using the assumeOrderByHighWaterMarkColumn configuration property. 要指定此提示,请按如下所示创建或更新索引器:To specify this hint, create or update your indexer as follows:

{
 ... other indexer definition properties
 "parameters" : {
        "configuration" : { "assumeOrderByHighWaterMarkColumn" : true } }
} 

为已删除的文档编制索引Indexing deleted documents

从集合中删除行时,通常还需要从搜索索引中删除这些行。When rows are deleted from the collection, you normally want to delete those rows from the search index as well. 数据删除检测策略旨在有效识别已删除的数据项。The purpose of a data deletion detection policy is to efficiently identify deleted data items. 目前,唯一支持的策略是 Soft Delete 策略(删除标有某种类型的标志),它按如下所示指定:Currently, the only supported policy is the Soft Delete policy (deletion is marked with a flag of some sort), which is specified as follows:

{
    "@odata.type" : "#Microsoft.Azure.Search.SoftDeleteColumnDeletionDetectionPolicy",
    "softDeleteColumnName" : "the property that specifies whether a document was deleted",
    "softDeleteMarkerValue" : "the value that identifies a document as deleted"
}

如果使用自定义查询,请确保查询投影由 softDeleteColumnName 引用的属性。If you are using a custom query, make sure that the property referenced by softDeleteColumnName is projected by the query.

下面的示例创建具有软删除策略的数据源:The following example creates a data source with a soft-deletion policy:

POST https://[service name].search.windows.net/datasources?api-version=2019-05-06
Content-Type: application/json
api-key: [Search service admin key]

{
    "name": "mycosmosdbdatasource",
    "type": "cosmosdb",
    "credentials": {
        "connectionString": "AccountEndpoint=https://myCosmosDbEndpoint.documents.azure.com;AccountKey=myCosmosDbAuthKey;Database=myCosmosDbDatabaseId"
    },
    "container": { "name": "myCosmosDbCollectionId" },
    "dataChangeDetectionPolicy": {
        "@odata.type": "#Microsoft.Azure.Search.HighWaterMarkChangeDetectionPolicy",
        "highWaterMarkColumnName": "_ts"
    },
    "dataDeletionDetectionPolicy": {
        "@odata.type": "#Microsoft.Azure.Search.SoftDeleteColumnDeletionDetectionPolicy",
        "softDeleteColumnName": "isDeleted",
        "softDeleteMarkerValue": "true"
    }
}

后续步骤Next steps

祝贺你!Congratulations! 已了解如何使用索引器将 Azure Cosmos DB 与 Azure 认知搜索集成。You have learned how to integrate Azure Cosmos DB with Azure Cognitive Search using an indexer.