Azure 搜索中的索引器Indexers in Azure Search

Azure 搜索中的索引器是一种爬网程序,它从外部 Azure 数据源提取可搜索的数据和元数据,并根据索引与数据源之间字段到字段映射填充索引。An indexer in Azure Search is a crawler that extracts searchable data and metadata from an external Azure data source and populates an index based on field-to-field mappings between the index and your data source. 因为不需要编写任何将数据推送到索引的代码,该服务就能拉取数据,因此这种方法有时也称为“拉取模式”。This approach is sometimes referred to as a 'pull model' because the service pulls data in without you having to write any code that pushes data to an index.

索引器基于数据源类型或平台,单个索引器适用于 Azure 上的 SQL Server、Cosmos DB、Azure 表存储和 Blob 存储,等等。Indexers are based on data source types or platforms, with individual indexers for SQL Server on Azure, Cosmos DB, Azure Table Storage and Blob Storage, and so forth.

可以单独使用索引器来引入数据,也可以结合索引器使用多种技术来加载索引中的部分字段。You can use an indexer as the sole means for data ingestion, or use a combination of techniques that include the use of an indexer for loading just some of the fields in your index.

可以按需运行索引器,也可以采用每 15 分钟运行一次的定期数据刷新计划来运行索引器。You can run indexers on demand or on a recurring data refresh schedule that runs as often as every fifteen minutes. 要进行更频繁的更新,则需要采用“推送模式”,便于同时更新 Azure 搜索和外部数据源中的数据。More frequent updates require a push model that simultaneously updates data in both Azure Search and your external data source.

创建及管理索引器的方法Approaches for creating and managing indexers

可以使用以下方法创建和管理索引器:You can create and manage indexers using these approaches:

一开始会将新的索引器宣布为预览版功能。Initially, a new indexer is announced as a preview feature. 预览版功能首先在 API(REST 和 .NET)中引入,然在逐渐公开发行以后再集成到门户中。Preview features are introduced in APIs (REST and .NET) and then integrated into the portal after graduating to general availability. 如果评估的是新索引器,则应做好编写代码的计划。If you're evaluating a new indexer, you should plan on writing code.

支持的数据源Supported data sources

索引器在 Azure 上抓取数据存储。Indexers crawl data stores on Azure.

基本配置步骤Basic configuration steps

索引器可提供数据源独有的功能。Indexers can offer features that are unique to the data source. 因此,索引器或数据源配置的某些方面会因索引器类型而不同。In this respect, some aspects of indexer or data source configuration will vary by indexer type. 但是,所有索引器的基本构成元素和要求都相同。However, all indexers share the same basic composition and requirements. 下面介绍所有索引器都适用的共同步骤。Steps that are common to all indexers are covered below.

步骤 1:创建数据源Step 1: Create a data source

索引器从保存信息的数据源(如连接字符串和可能的凭据)拉取数据。An indexer pulls data from a data source which holds information such as a connection string and possibly credentials. 调用创建数据源 REST API 或 DataSource 类以创建资源。Call the Create Datasource REST API or DataSource class to create the resource.

数据源的配置和管理独立于使用数据源的索引器,这意味着多个索引器可使用一个数据源,同时加载多个索引。Data sources are configured and managed independently of the indexers that use them, which means a data source can be used by multiple indexers to load more than one index at a time.

步骤 2:创建索引Step 2: Create an index

索引器会自动执行某些与数据引入相关的任务,但通常不会自动创建索引。An indexer will automate some tasks related to data ingestion, but creating an index is generally not one of them. 先决条件是必须具有预定义的索引,且索引的字段必须与外部数据源中的字段匹配。As a prerequisite, you must have a predefined index with fields that match those in your external data source. 有关构建索引的详细信息,请参阅 创建索引(Azure 搜索 REST API)索引类For more information about structuring an index, see Create an Index (Azure Search REST API) or Index class. 如需字段关联方面的帮助,请参阅 Azure 搜索索引器中的字段映射For help with field associations, see Field mappings in Azure Search indexers.


虽然不能使用索引器来生成索引,但可以使用门户中的导入数据向导。Although indexers cannot generate an index for you, the Import data wizard in the portal can help. 大多数情况下,该向导可以根据源中现有的元数据推断索引架构,提供一个初级索引架构,该架构在向导处于活动状态时可以进行内联编辑。In most cases, the wizard can infer an index schema from existing metadata in the source, presenting a preliminary index schema which you can edit in-line while the wizard is active. 在服务上创建索引以后,若要在门户中进一步进行编辑,多数情况下只能添加新字段。Once the index is created on the service, further edits in the portal are mostly limited to adding new fields. 可以将向导视为索引的创建工具而非修订工具。Consider the wizard for creating, but not revising, an index. 如需手动方式的学习,请一步步完成门户演练For hands-on learning, step through the portal walkthrough.

步骤 3:创建和计划索引器Step 3: Create and schedule the indexer

索引器定义是一种构造,用于指定索引、数据源和计划。The indexer definition is a construct specifying the index, data source, and a schedule. 索引器可从另一个服务引用数据源,前提是该数据源来自同一个订阅。An indexer can reference a data source from another service, as long as that data source is from the same subscription. 有关构建索引器的详细信息,请参阅 创建索引器(Azure 搜索 REST API)For more information about structuring an indexer, see Create Indexer (Azure Search REST API).

后续步骤Next steps

了解基本概念后,下一步是查看每种数据源特定的要求和任务。Now that you have the basic idea, the next step is to review requirements and tasks specific to each data source type.