您现在访问的是微软AZURE全球版技术文档网站,若需要访问由世纪互联运营的MICROSOFT AZURE中国区技术文档网站,请访问 https://docs.azure.cn.

数据导入概述-Azure 认知搜索Data import overview - Azure Cognitive Search

在 Azure 认知搜索中,查询执行加载到的内容,并保存在搜索索引中。In Azure Cognitive Search, queries execute over your content loaded into and saved in a search index. 本文介绍了两种用于填充索引的基本方法:以编程方式将数据推送到索引中,或将Azure 认知搜索索引器置于支持的数据源以请求数据。This article examines the two basic approaches for populating an index: push your data into the index programmatically, or point an Azure Cognitive Search indexer at a supported data source to pull in the data.

无论采用哪种方法,目的都是将数据从外部数据源加载到 Azure 认知搜索索引。With either approach, the objective is to load data from an external data source into an Azure Cognitive Search index. Azure 认知搜索允许你创建一个空索引,但在向其中推送数据或将数据拉取到其中时,它是不可查询的。Azure Cognitive Search will let you create an empty index, but until you push or pull data into it, it's not queryable.

将数据推送至索引Pushing data to an index

用于以编程方式将数据发送到 Azure 认知搜索的推送模型是最灵活的方法。The push model, used to programmatically send your data to Azure Cognitive Search, is the most flexible approach. 首先,它对数据源类型没有限制。First, it has no restrictions on data source type. 假设数据集中的每个文档都具有映射到索引架构中定义的字段的字段,则可将包含 JSON 文档的任何数据集推送到 Azure 认知搜索索引。Any dataset composed of JSON documents can be pushed to an Azure Cognitive Search index, assuming each document in the dataset has fields mapping to fields defined in your index schema. 其次,它对执行频率没有限制。Second, it has no restrictions on frequency of execution. 可以根据需要选择相应的频率,将更改推送到索引。You can push changes to an index as often as you like. 对于具有极低延迟要求的应用程序(例如,如果需要搜索操作与动态库存数据库同步),只能选择推送模型。For applications having very low latency requirements (for example, if you need search operations to be in sync with dynamic inventory databases), the push model is your only option.

此方法相比拉模型更加灵活,因为可以单个或批量上传文档(每批最多 1000 个或 16MB,以先达到为准)。This approach is more flexible than the pull model because you can upload documents individually or in batches (up to 1000 per batch or 16 MB, whichever limit comes first). 推送模型还允许将文档上传到 Azure 认知搜索,而不考虑数据的位置。The push model also allows you to upload documents to Azure Cognitive Search regardless of where your data is.

如何将数据推送到 Azure 认知搜索索引How to push data to an Azure Cognitive Search index

可以使用以下 API,将单个或多个文档加载到一个索引中:You can use the following APIs to load single or multiple documents into an index:

目前尚没有支持通过门户推送数据的工具。There is currently no tool support for pushing data via the portal.

有关每种方法的简介,请参阅快速入门:使用 PowerShell 创建 azure 认知搜索索引 C#快速入门:使用 .net SDK 创建 azure 认知搜索索引For an introduction to each methodology, see Quickstart: Create an Azure Cognitive Search index using PowerShell or C# Quickstart: Create an Azure Cognitive Search index using .NET SDK.

索引操作:上传、合并、mergeOrUpload、删除Indexing actions: upload, merge, mergeOrUpload, delete

可以按文档控制索引操作的类型,指定是应该完整地上传文档、与现有文档内容合并还是将其删除。You can control the type of indexing action on a per-document basis, specifying whether the document should be uploaded in full, merged with existing document content, or deleted.

在 REST API 中,向 Azure 认知搜索索引的终结点 URL 发出包含 JSON 请求正文的 HTTP POST 请求。In the REST API, issue HTTP POST requests with JSON request bodies to your Azure Cognitive Search index's endpoint URL. “value”数组中的每个 JSON 对象都包含文档的密钥,并指定索引操作是添加、更新还是删除文档内容。Each JSON object in the "value" array contains the document's key and specifies whether an indexing action adds, updates, or deletes document content. 有关代码示例,请参阅加载文档For a code example, see Load documents.

在 .NET SDK 中,请将数据打包到 IndexBatch 对象中。In the .NET SDK, package up your data into an IndexBatch object. IndexBatch 封装 IndexAction 对象的集合,其中每个对象都包含一个文档和一个属性,告知 Azure 认知搜索对该文档执行的操作。An IndexBatch encapsulates a collection of IndexAction objects, each of which contains a document and a property that tells Azure Cognitive Search what action to perform on that document. 有关代码示例,请参阅 C# 快速入门For a code example, see the C# Quickstart.

@search.action 说明Description 每个文档必需的字段Necessary fields for each document 说明Notes
upload upload 操作类似于“upsert”,如果文档是新文档,则插入;如果文档已经存在,则进行更新/替换。An upload action is similar to an "upsert" where the document will be inserted if it is new and updated/replaced if it exists. 关键字段以及要定义的任何其他字段key, plus any other fields you wish to define 更新/替换现有文档时,会将请求中未指定的任何字段设置为 nullWhen updating/replacing an existing document, any field that is not specified in the request will have its field set to null. 即使该字段之前设置为了非 null 值也是如此。This occurs even when the field was previously set to a non-null value.
merge 使用指定的字段更新现有文档。Updates an existing document with the specified fields. 如果索引中不存在该文档,merge 会失败。If the document does not exist in the index, the merge will fail. 关键字段以及要定义的任何其他字段key, plus any other fields you wish to define merge 中指定的任何字段都将替换文档中的现有字段。Any field you specify in a merge will replace the existing field in the document. 在 .NET SDK 中,这包括 DataType.Collection(DataType.String) 类型的字段。In the .NET SDK, this includes fields of type DataType.Collection(DataType.String). 在 REST API 中,这包括 Collection(Edm.String) 类型的字段。In the REST API, this includes fields of type Collection(Edm.String). 例如,如果文档包含值为 tags 的字段 ["budget"],并且已使用值 ["economy", "pool"]tags 执行合并,则 tags 字段的最终值将为 ["economy", "pool"]For example, if the document contains a field tags with value ["budget"] and you execute a merge with value ["economy", "pool"] for tags, the final value of the tags field will be ["economy", "pool"]. 而不会是 ["budget", "economy", "pool"]It will not be ["budget", "economy", "pool"].
mergeOrUpload 如果索引中已存在具有给定关键字段的文档,则此操作的行为类似于 mergeThis action behaves like merge if a document with the given key already exists in the index. 如果该文档不存在,则它的行为类似于对新文档进行 uploadIf the document does not exist, it behaves like upload with a new document. 关键字段以及要定义的任何其他字段key, plus any other fields you wish to define -
delete 从索引中删除指定文档。Removes the specified document from the index. 仅关键字段key only 所指定关键字段以外的所有字段都会被忽略。Any fields you specify other than the key field will be ignored. 如果要从文档中删除单个字段,请改用 merge,只需将该字段显式设置为 null。If you want to remove an individual field from a document, use merge instead and simply set the field explicitly to null.

确定要使用的索引操作Decide which indexing action to use

若要使用 .NET SDK 导入数据,请执行 upload、merge、delete 和 mergeOrUpload 操作。To import data using the .NET SDK, (upload, merge, delete, and mergeOrUpload). 根据选择的以下操作,每个文档必须仅包含某些特定的字段:Depending on which of the below actions you choose, only certain fields must be included for each document:

表述查询Formulate your query

有两种方法可以 使用 REST API 搜索索引There are two ways to search your index using the REST API. 一种方法是发出 HTTP POST 请求,这种请求的查询参数在请求主题的 JSON 对象中定义。One way is to issue an HTTP POST request where your query parameters are defined in a JSON object in the request body. 另一种方法是发出 HTTP GET 请求,这种请求的查询参数在请求 URL 中定义。The other way is to issue an HTTP GET request where your query parameters are defined within the request URL. POST 的查询参数大小限制比 GET 宽松POST has more relaxed limits on the size of query parameters than GET. 因此建议使用 POST,使用 GET 更方便的特殊情况除外。For this reason, we recommend using POST unless you have special circumstances where using GET would be more convenient.

POST 和 GET 都需要在请求 URL 中提供服务名称索引名称和正确的 API 版本(发布本文档时的 API 版本为 2019-05-06)。For both POST and GET, you need to provide your service name, index name, and the proper API version (the current API version is 2019-05-06 at the time of publishing this document) in the request URL. GET 的 URL 末尾为查询字符串,用于提供查询参数。For GET, the query string at the end of the URL is where you provide the query parameters. 有关 URL 格式,请参见以下内容:See below for the URL format:

https://[service name].search.windows.net/indexes/[index name]/docs?[query string]&api-version=2019-05-06

POST 的 URL 格式相同,只是查询字符串参数仅包含 API 版本。The format for POST is the same, but with only api-version in the query string parameters.

将数据拉取到索引中Pulling data into an index

提取模型对支持的数据源进行爬网,将数据自动上传到索引中。The pull model crawls a supported data source and automatically uploads the data into your index. 在 Azure 认知搜索中,此功能通过索引器实现,当前可用于这些平台:In Azure Cognitive Search, this capability is implemented through indexers, currently available for these platforms:

索引器将索引连接到数据源(通常是表、视图或等效的结构),将源字段映射到索引中的等效字段。Indexers connect an index to a data source (usually a table, view, or equivalent structure), and map source fields to equivalent fields in the index. 在执行期间,行集会自动转换为 JSON 并载入指定的索引中。During execution, the rowset is automatically transformed to JSON and loaded into the specified index. 所有索引器支持计划,使用户能够指定数据的刷新频率。All indexers support scheduling so that you can specify how frequently the data is to be refreshed. 大多数索引器提供更改跟踪(如果受数据源的支持)。Most indexers provide change tracking if the data source supports it. 除了识别新文档外,通过跟踪对现有文档的更改和删除外,索引器免除了主动管理索引中数据的必要。By tracking changes and deletes to existing documents in addition to recognizing new documents, indexers remove the need to actively manage the data in your index.

如何将数据提取到 Azure 认知搜索索引How to pull data into an Azure Cognitive Search index

索引器功能已在 Azure 门户REST API.NET SDK 中公开。Indexer functionality is exposed in the Azure portal, the REST API, and the .NET SDK.

使用门户的优势在于,Azure 认知搜索通常可以通过读取源数据集的元数据来生成默认索引架构。An advantage to using the portal is that Azure Cognitive Search can usually generate a default index schema for you by reading the metadata of the source dataset. 在处理生成的索引之前可对其进行修改,此后,只能编辑不需要重建索引的架构。You can modify the generated index until the index is processed, after which the only schema edits allowed are those that do not require reindexing. 如果想要进行的更改会直接影响架构,则需要重建索引。If the changes you want to make impact the schema directly, you would need to rebuild the index.

使用搜索浏览器验证数据导入Verify data import with Search explorer

针对文档上传执行初步检查的捷径之一是在门户中使用搜索浏览器A quick way to perform a preliminary check on the document upload is to use Search explorer in the portal. 使用资源管理器可以直接查询索引,而无需编写任何代码。The explorer lets you query an index without having to write any code. 搜索体验取决于默认设置,例如简单语法和默认的 searchMode 查询参数The search experience is based on default settings, such as the simple syntax and default searchMode query parameter. 结果以 JSON 格式返回,方便用户检查整个文档。Results are returned in JSON so that you can inspect the entire document.

提示

许多Azure 认知搜索代码示例包括嵌入或可随时可用的数据集,提供一种简单的入门方法。Numerous Azure Cognitive Search code samples include embedded or readily available datasets, offering an easy way to get started. 门户中还提供了一个示例索引器,以及一个由小型房地产数据集组成的数据源(名为“realestate-us-sample”)。The portal also provides a sample indexer and data source consisting of a small real estate dataset (named "realestate-us-sample"). 针对示例数据源运行预配置的索引器时,会创建索引并连同文档一起加载该索引,然后,可以使用搜索浏览器或编写的代码查询该索引。When you run the preconfigured indexer on the sample data source, an index is created and loaded with documents that can then be queried in Search explorer or by code that you write.

另请参阅See also