您现在访问的是微软AZURE全球版技术文档网站,若需要访问由世纪互联运营的MICROSOFT AZURE中国区技术文档网站,请访问 https://docs.azure.cn.

如何在 Azure 中计划索引器认知搜索How to schedule indexers in Azure Cognitive Search

通常,在创建索引器后,该索引器会紧接着运行一次。An indexer normally runs once, immediately after it is created. 可以使用门户、REST API 或 .NET SDK 按需再次运行该索引器。You can run it again on demand using the portal, the REST API, or the .NET SDK. 还可以将索引器配置为按计划定期运行。You can also configure an indexer to run periodically on a schedule.

在某些情况下,计划索引器会很有作用:Some situations where indexer scheduling is useful:

  • 源数据将随着时间的推移而更改,并且你希望 Azure 认知搜索索引器自动处理已更改的数据。Source data will change over time, and you want the Azure Cognitive Search indexers to automatically process the changed data.
  • 索引从多个数据源填充,你想要确保索引器在不同的时间运行,以减少冲突。The index will be populated from multiple data sources and you want to make sure the indexers run at different times to reduce conflicts.
  • 源数据极大,你想要将索引器的处理负载分散到不同的时间。The source data is very large and you want to spread the indexer processing over time. 有关对大量数据进行索引的详细信息,请参阅如何为 Azure 中的大型数据集编制索引认知搜索For more information about indexing large volumes of data, see How to index large data sets in Azure Cognitive Search.

计划程序是 Azure 认知搜索的内置功能。The scheduler is a built-in feature of Azure Cognitive Search. 无法使用外部计划程序来控制搜索索引器。You can't use an external scheduler to control search indexers.

定义计划属性Define schedule properties

索引器计划有两个属性:An indexer schedule has two properties:

  • 间隔:定义按计划执行索引器的间隔时间。Interval, which defines the amount of time in between scheduled indexer executions. 允许的最小间隔为 5 分钟,最大间隔为 24 小时。The smallest interval allowed is 5 minutes, and the largest is 24 hours.
  • 开始时间 (UTC) :指示首次运行索引器的时间。Start Time (UTC), which indicates the first time at which the indexer should be run.

可以在首次创建索引器时指定计划,以后也可以通过更新索引器的属性来指定计划。You can specify a schedule when first creating the indexer, or by updating the indexer's properties later. 可以使用门户REST API.NET SDK 设置索引器计划。Indexer schedules can be set using the portal, the REST API, or the .NET SDK.

一次只能运行一个索引器执行。Only one execution of an indexer can run at a time. 如果在计划索引器的下一次执行时该索引器已运行,该执行将推迟到下一个计划时间。If an indexer is already running when its next execution is scheduled, that execution is postponed until the next scheduled time.

让我们考虑更具体的示例。Let’s consider an example to make this more concrete. 假设我们要使用每小时间隔开始时间 2019 年 6 月 1 日上午 8:00 (UTC) 来配置索引器计划。Suppose we configure an indexer schedule with an Interval of hourly and a Start Time of June 1, 2019 at 8:00:00 AM UTC. 下面是索引器运行时间超过一小时时可能出现的情况:Here’s what could happen when an indexer run takes longer than an hour:

  • 第一次索引器执行的开始时间为 2019 年 6 月 1 日上午 8:00 (UTC) 或近似时间。The first indexer execution starts at or around June 1, 2019 at 8:00 AM UTC. 假设此执行需要 20 分钟(或小于 1 小时的任何时间)。Assume this execution takes 20 minutes (or any time less than 1 hour).
  • 第二次索引器执行的开始时间为 2019 年 6 月 1 日上午 9:00 (UTC) 或近似时间。The second execution starts at or around June 1, 2019 9:00 AM UTC. 假设此执行耗时 70 分钟(超过 1 小时),并且在上午 10:10 (UTC) 之前无法完成。Suppose that this execution takes 70 minutes - more than an hour – and it will not complete until 10:10 AM UTC.
  • 第三次执行的计划开始时间为上午 10:00 (UTC),但此时上一次执行仍在运行。The third execution is scheduled to start at 10:00 AM UTC, but at that time the previous execution is still running. 那么,将会跳过此计划的执行。This scheduled execution is then skipped. 索引器的下一次执行在上午 11:00 (UTC) 之前不会开始。The next execution of the indexer will not start until 11:00 AM UTC.

备注

如果将索引器设置为某个计划,但每次运行时一次又一次地在同一文档上反复失败,则索引器将以不那么频繁的间隔开始运行(最多每 24 小时至少一次),直到它成功地再次取得进展。If an indexer is set to a certain schedule but repeatedly fails on the same document over and over again each time it runs, the indexer will begin running on a less frequent interval (up to the maximum of at least once every 24 hours) until it successfully makes progress again. 如果你认为你已修复了导致索引器在某一点停滞的任何问题,可以按需运行索引器,如果成功取得进展,索引器将再次回到其设置的计划间隔。If you believe you have fixed whatever the issue that was causing the indexer to be stuck at a certain point, you can perform an on demand run of the indexer, and if that successfully makes progress, the indexer will return to its set schedule interval again.

门户中的计划Schedule in the portal

在创建时,可以使用门户中的“导入数据”向导来定义索引器的计划。The Import Data wizard in the portal lets you define the schedule for an indexer at creation time. 默认的“计划”设置为“小时”,即,索引器在创建后将运行一次,然后每隔一小时再次运行。The default Schedule setting is Hourly, which means the indexer runs once after it is created, and runs again every hour afterwards.

如果你不希望索引器再次自动运行,可将“计划”设置更改为“一次”;或者更改为“每日”以每日运行一次。You can change the Schedule setting to Once if you don't want the indexer to run again automatically, or to Daily to run once per day. 若要指定不同的间隔或特定的将来开始时间,请将其设置为“自定义”。Set it to Custom if you want to specify a different interval or a specific future Start Time.

将计划设置为“自定义”时,会显示相应的字段让你指定“间隔”和“开始时间(UTC)”。When you set the schedule to Custom, fields appear to let you specify the Interval and the Start Time (UTC). 允许的最短时间间隔为 5 分钟,最长为 1440 分钟(24 小时)。The shortest time interval allowed is 5 minutes, and the longest is 1440 minutes (24 hours).

在导入数据向导中设置索引器计划Setting indexer schedule in Import Data wizard

创建索引器后,可以使用索引器的“编辑”面板更改计划设置。After an indexer has been created, you can change the schedule settings using the indexer's Edit panel. “计划”字段与“导入数据”向导中的字段相同。The Schedule fields are the same as in the Import Data wizard.

在索引器编辑面板中设置计划Setting the schedule in indexer Edit panel

使用 REST Api 计划Schedule using REST APIs

可以使用 REST API 定义索引器的计划。You can define the schedule for an indexer using the REST API. 为此,请在创建或更新索引器时包含 schedule 属性。To do this, include the schedule property when creating or updating the indexer. 以下示例演示了用于更新现有索引器的 PUT 请求:The example below shows a PUT request to update an existing indexer:

PUT https://myservice.search.windows.net/indexers/myindexer?api-version=2019-05-06
Content-Type: application/json
api-key: admin-key

{
    "dataSourceName" : "myazuresqldatasource",
    "targetIndexName" : "target index name",
    "schedule" : { "interval" : "PT10M", "startTime" : "2015-01-01T00:00:00Z" }
}

间隔参数是必需的。The interval parameter is required. 间隔是指开始两个连续的索引器执行之间的时间。The interval refers to the time between the start of two consecutive indexer executions. 允许的最小间隔为 5 分钟;最长为一天。The smallest allowed interval is 5 minutes; the longest is one day. 必须将其格式化为 XSD“dayTimeDuration”值(ISO 8601 持续时间值的受限子集)。It must be formatted as an XSD "dayTimeDuration" value (a restricted subset of an ISO 8601 duration value). 它的模式为:P(nD)(T(nH)(nM))The pattern for this is: P(nD)(T(nH)(nM)). 示例:PT15M 为每隔 15 分钟,PT2H 为每隔 2 小时。Examples: PT15M for every 15 minutes, PT2H for every 2 hours.

可选的 startTime 指示计划的执行何时开始。The optional startTime indicates when scheduled executions should begin. 如果省略,则使用当前 UTC 时间。If it is omitted, the current UTC time is used. 此时间可以是过去的时间,在此情况下,计划的第一次执行的运行方式如同索引器在原始 startTime 之后连续运行。This time can be in the past, in which case the first execution is scheduled as if the indexer has been running continuously since the original startTime.

还可以使用“运行索引器”调用随时按需运行索引器。You can also run an indexer on demand at any time using the Run Indexer call. 有关运行索引器和设置索引器计划的详细信息,请参阅“REST API 参考”中的运行索引器获取索引器更新索引器For more information about running indexers and setting indexer schedules, see Run Indexer, Get Indexer, and Update Indexer in the REST API Reference.

使用 .NET SDK 进行计划Schedule using the .NET SDK

可以使用 Azure 认知搜索 .NET SDK 定义索引器的计划。You can define the schedule for an indexer using the Azure Cognitive Search .NET SDK. 为此,请在创建或更新索引器时包含 schedule 属性。To do this, include the schedule property when creating or updating an Indexer.

以下 C# 示例使用预定义的数据源和索引创建一个索引器,并将其计划设置为从现在起的 30 分钟开始每天运行一次:The following C# example creates an indexer, using a predefined data source and index, and sets its schedule to run once every day starting 30 minutes from now:

    Indexer indexer = new Indexer(
        name: "azure-sql-indexer",
        dataSourceName: dataSource.Name,
        targetIndexName: index.Name,
        schedule: new IndexingSchedule(
                        TimeSpan.FromDays(1), 
                        new DateTimeOffset(DateTime.UtcNow.AddMinutes(30))
                    )
        );
    await searchService.Indexers.CreateOrUpdateAsync(indexer);

如果省略 schedule 参数,则索引器将在创建后立即运行一次。If the schedule parameter is omitted, the indexer will only run once immediately after it is created.

startTime 参数可设置为过去的时间。The startTime parameter can be set to a time in the past. 在此情况下,计划的第一次执行的运行方式如同索引器在给定 startTime 之后连续运行。In that case, the first execution is scheduled as if the indexer has been running continuously since the given startTime.

计划是使用 IndexingSchedule 类定义的。The schedule is defined using the IndexingSchedule class. IndexingSchedule 构造函数需要一个使用 TimeSpan 对象指定的 interval 参数。The IndexingSchedule constructor requires an interval parameter specified using a TimeSpan object. 允许的最小间隔值为 5 分钟,最大间隔值为 24 小时。The smallest interval value allowed is 5 minutes, and the largest is 24 hours. 指定为 DateTimeOffset 对象的第二个 startTime 参数是可选的。The second startTime parameter, specified as a DateTimeOffset object, is optional.

.NET SDK 允许使用 SearchServiceClient 类及其 Indexers 属性(实现 IIndexersOperations 接口中的方法)来控制索引器操作。The .NET SDK lets you control indexer operations using the SearchServiceClient class and its Indexers property, which implements methods from the IIndexersOperations interface.

随时可以使用 RunRunAsyncRunWithHttpMessagesAsync 方法按需运行索引器。You can run an indexer on demand at any time using one of the Run, RunAsync, or RunWithHttpMessagesAsync methods.

有关创建、更新和运行索引器的详细信息,请参阅 IIindexersOperationsFor more information about creating, updating, and running indexers, see IIindexersOperations.