您现在访问的是微软AZURE全球版技术文档网站,若需要访问由世纪互联运营的MICROSOFT AZURE中国区技术文档网站,请访问 https://docs.azure.cn.

创建用于缩放 Batch 池中的计算节点的自动公式Create an automatic formula for scaling compute nodes in a Batch pool

Azure Batch 可以根据定义的参数自动缩放池。Azure Batch can automatically scale pools based on parameters that you define. 通过自动缩放,Batch 在任务需求提高时动态将节点添加到池中,并在任务需求降低时删除计算节点。With automatic scaling, Batch dynamically adds nodes to a pool as task demands increase, and removes compute nodes as they decrease. 可以通过自动调整 Batch 应用程序使用的计算节点数来节省时间和资金。You can save both time and money by automatically adjusting the number of compute nodes used by your Batch application.

可以通过将计算节点池与你定义的自动缩放公式关联,来启用该池的自动缩放。You enable automatic scaling on a pool of compute nodes by associating with it an autoscale formula that you define. Batch 服务会使用自动缩放公式确定执行工作负荷所需的计算节点数。The Batch service uses the autoscale formula to determine the number of compute nodes that are needed to execute your workload. 计算节点可以是专用节点,也可以是低优先级节点Compute nodes may be dedicated nodes or low-priority nodes. Batch 会响应定期收集的服务指标数据。Batch responds to service metrics data that is collected periodically. Batch 使用此指标数据,基于公式并按可配置的间隔来调整池中的计算节点数。Using this metrics data, Batch adjusts the number of compute nodes in the pool based on your formula and at a configurable interval.

可以在创建池时启用自动缩放,也可以对现有池启用该功能。You can enable automatic scaling either when a pool is created, or on an existing pool. 还可以更改已配置自动缩放的池的现有公式。You can also change an existing formula on a pool that is configured for autoscaling. Batch 使你可以在将公式分配给池之前先评估公式,以及监视自动缩放运行的状态。Batch enables you to evaluate your formulas before assigning them to pools and to monitor the status of automatic scaling runs.

本文讨论构成自动缩放公式的各个实体,包括变量、运算符、操作和函数。This article discusses the various entities that make up your autoscale formulas, including variables, operators, operations, and functions. 本文介绍如何在 Batch 中获取各种计算资源和任务指标。We discuss how to obtain various compute resource and task metrics within Batch. 可以使用这些指标,根据资源使用情况和任务状态对池的节点计数进行调整。You can use these metrics to adjust your pool's node count based on resource usage and task status. 然后,介绍如何使用 Batch REST 和 .NET API 构建公式以及对池启用自动缩放。We then describe how to construct a formula and enable automatic scaling on a pool by using both the Batch REST and .NET APIs. 最后,讨论几个示例公式。Finally, we finish up with a few example formulas.

重要

创建 Batch 帐户时,可以指定帐户配置,用于确定是要在 Batch 服务订阅(默认设置)还是用户订阅中分配池。When you create a Batch account, you can specify the account configuration, which determines whether pools are allocated in a Batch service subscription (the default), or in your user subscription. 如果使用默认的 Batch 服务配置创建了 Batch 帐户,则该帐户将限制为可用于处理的最大核心数。If you created your Batch account with the default Batch Service configuration, then your account is limited to a maximum number of cores that can be used for processing. Batch 服务最多只能将计算节点数扩展到该核心数限制。The Batch service scales compute nodes only up to that core limit. 出于此原因,Batch 服务可能达不到自动缩放公式所指定的目标计算节点数。For this reason, the Batch service may not reach the target number of compute nodes specified by an autoscale formula. 请参阅 Azure Batch 服务的配额和限制了解有关查看和提高帐户配额的信息。See Quotas and limits for the Azure Batch service for information on viewing and increasing your account quotas.

如果使用用户订阅配置创建了帐户,则该帐户将共享订阅的核心配额。If you created your account with the User Subscription configuration, then your account shares in the core quota for the subscription. 有关详细信息,请参阅 Azure 订阅和服务限制、配额和约束条件中的虚拟机限制For more information, see Virtual Machines limits in Azure subscription and service limits, quotas, and constraints.

自动缩放公式Automatic scaling formulas

自动缩放公式是一个定义的包含一个或多个语句的字符串值。An automatic scaling formula is a string value that you define that contains one or more statements. 自动缩放公式分配给池的autoScaleFormula元素(batch REST)或CloudPool属性(batch .net)。The autoscale formula is assigned to a pool's autoScaleFormula element (Batch REST) or CloudPool.AutoScaleFormula property (Batch .NET). Batch 服务使用公式来确定池中可供下一个处理间隔使用的目标计算节点数。The Batch service uses your formula to determine the target number of compute nodes in the pool for the next interval of processing. 公式字符串不能超过 8 KB,最多可以包含 100 个以分号分隔的语句,可以包括换行符和注释。The formula string cannot exceed 8 KB, can include up to 100 statements that are separated by semicolons, and can include line breaks and comments.

可以将自动缩放公式视为 Batch 自动缩放“语言”。You can think of automatic scaling formulas as a Batch autoscale "language." 公式语句是自由形式的表达式,可以包括服务定义的变量(由 Batch 服务定义的变量)和用户定义的变量(定义的变量)。Formula statements are free-formed expressions that can include both service-defined variables (variables defined by the Batch service) and user-defined variables (variables that you define). 公式语句可以通过内置类型、运算符和函数对这些值执行各种操作。They can perform various operations on these values by using built-in types, operators, and functions. 例如,语句可以采用以下格式:For example, a statement might take the following form:

$myNewVariable = function($ServiceDefinedVariable, $myCustomVariable);

公式通常包含多个语句,这些语句对先前语句中获取的值执行操作。Formulas generally contain multiple statements that perform operations on values that are obtained in previous statements. 例如,首先获取 variable1 的值,然后将其传递给一个函数来填充 variable2For example, first we obtain a value for variable1, then pass it to a function to populate variable2:

$variable1 = function1($ServiceDefinedVariable);
$variable2 = function2($OtherServiceDefinedVariable, $variable1);

在自动缩放公式中包含这些语句可实现计算节点的目标数。Include these statements in your autoscale formula to arrive at a target number of compute nodes. 专用节点和低优先级节点都有自身的目标设置,因此,你可以针对每种类型的节点定义目标。Dedicated nodes and low-priority nodes each have their own target settings, so that you can define a target for each type of node. 自动缩放公式可以包含专用节点的目标值和/或低优先级节点的目标值。An autoscale formula can include a target value for dedicated nodes, a target value for low-priority nodes, or both.

节点的目标数可以大于、小于或等于池中该节点类型的当前数目。The target number of nodes may be higher, lower, or the same as the current number of nodes of that type in the pool. Batch 按特定的间隔(请参阅自动缩放间隔)对池的自动缩放公式求值。Batch evaluates a pool's autoscale formula at a specific interval (see automatic scaling intervals). Batch 将池中每种节点类型的目标数调整成在求值时自动缩放公式所指定的数目。Batch adjusts the target number of each type of node in the pool to the number that your autoscale formula specifies at the time of evaluation.

示例自动缩放公式Sample autoscale formulas

下面是两个自动缩放公式的示例,可以将其调整为适用于大多数方案。Below are examples of two autoscale formulas, which can be adjusted to work for most scenarios. 示例公式中 startingNumberOfVMsmaxNumberofVMs 的变量可以根据需要进行调整。The variables startingNumberOfVMs and maxNumberofVMs in the example formulas can be adjusted to your needs.

挂起任务Pending tasks

startingNumberOfVMs = 1;
maxNumberofVMs = 25;
pendingTaskSamplePercent = $PendingTasks.GetSamplePercent(180 * TimeInterval_Second);
pendingTaskSamples = pendingTaskSamplePercent < 70 ? startingNumberOfVMs : avg($PendingTasks.GetSample(180 * TimeInterval_Second));
$TargetDedicatedNodes=min(maxNumberofVMs, pendingTaskSamples);
$NodeDeallocationOption = taskcompletion;

对于此自动缩放公式,最初使用单个 VM 创建池。With this autoscale formula, the pool is initially created with a single VM. $PendingTasks 指标定义正在运行或已排队的任务数。The $PendingTasks metric defines the number of tasks that are running or queued. 该公式查找过去 180 秒内的平均挂起任务数,并相应地设置 $TargetDedicatedNodes 变量。The formula finds the average number of pending tasks in the last 180 seconds and sets the $TargetDedicatedNodes variable accordingly. 该公式确保专用节点的目标数永远不会超过 25 个 VM。The formula ensures that the target number of dedicated nodes never exceeds 25 VMs. 提交新任务时,池会自动增大。As new tasks are submitted, the pool automatically grows. 任务完成时,VM 会逐个变为可用状态,自动缩放公式会收缩池。As tasks complete, VMs become free one by one and the autoscaling formula shrinks the pool.

此公式缩放专用节点,但可对其进行修改,使其也适用于缩放低优先级节点。This formula scales dedicated nodes, but can be modified to apply to scale low-priority nodes as well.

抢占节点Preempted nodes

maxNumberofVMs = 25;
$TargetDedicatedNodes = min(maxNumberofVMs, $PreemptedNodeCount.GetSample(180 * TimeInterval_Second));
$TargetLowPriorityNodes = min(maxNumberofVMs , maxNumberofVMs - $TargetDedicatedNodes);
$NodeDeallocationOption = taskcompletion;

此示例创建一个以25个低优先级节点开头的池。This example creates a pool that starts with 25 low-priority nodes. 每次抢占低优先级节点时,会将其替换为专用节点。Every time a low-priority node is preempted, it is replaced with a dedicated node. 与第一个示例一样,maxNumberofVMs 变量会阻止池超过25个 Vm。As with the first example, the maxNumberofVMs variable prevents the pool from exceeding 25 VMs. 此示例可用于利用低优先级 Vm,同时还可确保池的生存期仅出现固定数量的 preemptions。This example is useful for taking advantage of low-priority VMs while also ensuring that only a fixed number of preemptions will occur for the lifetime of the pool.

变量Variables

可在自动缩放公式中同时使用“服务定义”和“用户定义”的变量。You can use both service-defined and user-defined variables in your autoscale formulas. 服务定义的变量内置在 Batch 服务中。The service-defined variables are built in to the Batch service. 有些服务定义的变量是可读写的,有些是只读的。Some service-defined variables are read-write, and some are read-only. 用户定义的变量是你定义的变量。User-defined variables are variables that you define. 在上一节中所示的示例公式中,$TargetDedicatedNodes$PendingTasks 是服务定义的变量。In the example formula shown in the previous section, $TargetDedicatedNodes and $PendingTasks are service-defined variables. 变量 startingNumberOfVMsmaxNumberofVMs 是用户定义的变量。Variables startingNumberOfVMs and maxNumberofVMs are user-defined variables.

备注

服务定义的变量始终前面带有美元符号 ($)。Service-defined variables are always preceded by a dollar sign ($). 对于用户定义的变量,美元符号是可选的。For user-defined variables, the dollar sign is optional.

下表显示了 Batch 服务定义的读写和只读变量。The following tables show both read-write and read-only variables that are defined by the Batch service.

可以获取和设置这些服务定义的变量的值,以管理池中计算节点的数目:You can get and set the values of these service-defined variables to manage the number of compute nodes in a pool:

可读写的服务定义变量Read-write service-defined variables 说明Description
$TargetDedicatedNodes$TargetDedicatedNodes 池的专用计算节点的目标数。The target number of dedicated compute nodes for the pool. 专用节点数指定为目标,因为池可能永远达不到所需的节点数目。The number of dedicated nodes is specified as a target because a pool may not always achieve the desired number of nodes. 例如,如果在池达到初始目标数之前专用节点的目标数被自动缩放评估修改,则池可能不会达到目标数数。For example, if the target number of dedicated nodes is modified by an autoscale evaluation before the pool has reached the initial target, then the pool may not reach the target.

如果目标数超过了 Batch 帐户节点或核心配额,则使用 Batch 服务配置创建的帐户中的池无法实现其目标。A pool in an account created with the Batch Service configuration may not achieve its target if the target exceeds a Batch account node or core quota. 如果目标数超过了订阅的共享核心配额,则使用用户订阅配置创建的帐户中的池无法实现其目标。A pool in an account created with the User Subscription configuration may not achieve its target if the target exceeds the shared core quota for the subscription.
$TargetLowPriorityNodes$TargetLowPriorityNodes 池的低先级计算节点的目标数。The target number of low-priority compute nodes for the pool. 低优先级节点数指定为目标,因为池可能永远达不到所需的节点数目。The number of low-priority nodes is specified as a target because a pool may not always achieve the desired number of nodes. 例如,如果在池达到初始目标数之前低优先级的目标数被自动缩放评估修改,则池可能不会达到目标数数。For example, if the target number of low-priority nodes is modified by an autoscale evaluation before the pool has reached the initial target, then the pool may not reach the target. 如果目标数超过 Batch 帐户节点或核心配额,则池也无法实现其目标。A pool may also not achieve its target if the target exceeds a Batch account node or core quota.

有关低优先级计算节点的详细信息,请参阅使用 Batch 的低优先级 vmFor more information on low-priority compute nodes, see Use low-priority VMs with Batch.
$NodeDeallocationOption$NodeDeallocationOption 从池中删除计算节点时发生的操作。The action that occurs when compute nodes are removed from a pool. 可能的值包括:Possible values are:
  • 重新排队-默认值。requeue-- The default value. 立即终止任务并将其放回作业队列,以便重新计划这些任务。Terminates tasks immediately and puts them back on the job queue so that they are rescheduled. 此操作可确保节点的目标数量尽快达到,但效率可能较低,因为任何正在运行的任务都将中断并且需要重新启动,从而浪费已经完成的所有工作。This action ensures the target number of nodes is reach as quickly as possible, but may be less efficient, as any running tasks will be interrupted and have to be restarted, wasting any work they had already done.
  • terminate--立即终止任务并将其从作业队列中删除。terminate--Terminates tasks immediately and removes them from the job queue.
  • taskcompletion--等待当前运行的任务完成,并从池中删除节点。taskcompletion--Waits for currently running tasks to finish and then removes the node from the pool. 使用此选项可以避免任务被中断和重新排队,从而浪费任务完成的任何工作。Use this option to avoid tasks from being interrupted and requeued, wasting any work the task has done.
  • retaineddata--等待清理节点上的本地任务保留的所有数据,并从池中删除节点。retaineddata--Waits for all the local task-retained data on the node to be cleaned up before removing the node from the pool.

备注

还可以使用别名 $TargetDedicated指定 $TargetDedicatedNodes 变量。The $TargetDedicatedNodes variable can also be specified using the alias $TargetDedicated. 同样,可以使用别名 $TargetLowPriority指定 $TargetLowPriorityNodes 变量。Similarly, the $TargetLowPriorityNodes variable can be specified using the alias $TargetLowPriority. 如果完全命名的变量及其别名都是由公式设置的,则分配给完全命名变量的值将优先。If both the fully named variable and its alias are set by the formula, the value assigned to the fully named variable will take precedence.

可以获取这些服务定义的变量的值,以根据 Batch 服务中的指标进行调整:You can get the value of these service-defined variables to make adjustments that are based on metrics from the Batch service:

只读的服务定义变量Read-only service-defined variables 说明Description
$CPUPercent$CPUPercent CPU 使用率的平均百分比。The average percentage of CPU usage.
$WallClockSeconds$WallClockSeconds 使用的秒数。The number of seconds consumed.
$MemoryBytes$MemoryBytes 使用的平均 MB 数。The average number of megabytes used.
$DiskBytes$DiskBytes 本地磁盘上使用的平均 GB 数。The average number of gigabytes used on the local disks.
$DiskReadBytes$DiskReadBytes 已读取的字节数。The number of bytes read.
$DiskWriteBytes$DiskWriteBytes 写入的字节数。The number of bytes written.
$DiskReadOps$DiskReadOps 执行的读取磁盘操作数。The count of read disk operations performed.
$DiskWriteOps$DiskWriteOps 执行的写入磁盘操作数。The count of write disk operations performed.
$NetworkInBytes$NetworkInBytes 入站字节数。The number of inbound bytes.
$NetworkInBytes$NetworkOutBytes 出站字节数。The number of outbound bytes.
$SampleNodeCount$SampleNodeCount 计算节点数。The count of compute nodes.
$ActiveTasks$ActiveTasks 已准备好执行但尚未执行的任务数。The number of tasks that are ready to execute but are not yet executing. $ActiveTasks 计数包括处于活动状态并且已满足其依赖关系的所有任务。The $ActiveTasks count includes all tasks that are in the active state and whose dependencies have been satisfied. 处于活动状态但不满足其依赖关系的所有任务将从 $ActiveTasks 计数中排除。Any tasks that are in the active state but whose dependencies have not been satisfied are excluded from the $ActiveTasks count. 对于多实例任务,$ActiveTasks 将包含任务上设置的实例数。For a multi-instance task, $ActiveTasks will include the number of instances set on the task.
$RunningTasks$RunningTasks 处于运行状态的任务数。The number of tasks in a running state.
$PendingTasks$PendingTasks $ActiveTasks 和 $RunningTasks 的总和。The sum of $ActiveTasks and $RunningTasks.
$SucceededTasks$SucceededTasks 成功完成的任务数。The number of tasks that finished successfully.
$FailedTasks$FailedTasks 失败的任务数。The number of tasks that failed.
$CurrentDedicatedNodes$CurrentDedicatedNodes 当前的专用计算节点数。The current number of dedicated compute nodes.
$CurrentLowPriorityNodes$CurrentLowPriorityNodes 当前的低优先级计算节点数,包括所有已被抢占的节点。The current number of low-priority compute nodes, including any nodes that have been preempted.
$PreemptedNodeCount$PreemptedNodeCount 池中处于预占状态的节点数。The number of nodes in the pool that are in a preempted state.

提示

上表中所示的服务定义的只读变量是一些对象,它们提供各种方法来访问与其相关的数据。The read-only, service-defined variables that are shown in the previous table are objects that provide various methods to access data associated with each. 有关详细信息,请参阅本文稍后的获取样本数据For more information, see Obtain sample data later in this article.

类型Types

公式支持以下类型:These types are supported in a formula:

  • doubledouble

  • doubleVecdoubleVec

  • doubleVecListdoubleVecList

  • 字符串string

  • timestamp--timestamp 是包含以下成员的复合结构:timestamp--timestamp is a compound structure that contains the following members:

    • yearyear
    • month (1-12)month (1-12)
    • day (1-31)day (1-31)
    • weekday(采用数字格式,例如 1 表示星期一)weekday (in the format of number; for example, 1 for Monday)
    • hour(采用 24 时制数字格式,例如 13 表示下午 1 点)hour (in 24-hour number format; for example, 13 means 1 PM)
    • minute (00-59)minute (00-59)
    • second (00-59)second (00-59)
  • timeintervaltimeinterval

    • TimeInterval_ZeroTimeInterval_Zero
    • TimeInterval_100nsTimeInterval_100ns
    • TimeInterval_MicrosecondTimeInterval_Microsecond
    • TimeInterval_MillisecondTimeInterval_Millisecond
    • TimeInterval_SecondTimeInterval_Second
    • TimeInterval_MinuteTimeInterval_Minute
    • TimeInterval_HourTimeInterval_Hour
    • TimeInterval_DayTimeInterval_Day
    • TimeInterval_WeekTimeInterval_Week
    • TimeInterval_YearTimeInterval_Year

操作Operations

上一部分中列出的类型允许以下操作。These operations are allowed on the types that are listed in the previous section.

OperationOperation 支持的运算符Supported operators 结果类型Result type
double operator doubledouble operator double +, -, *, /+, -, *, / doubledouble
double operator timeintervaldouble operator timeinterval * timeintervaltimeinterval
doubleVec operator doubledoubleVec operator double +, -, *, /+, -, *, / doubleVecdoubleVec
doubleVec operator doubleVecdoubleVec operator doubleVec +, -, *, /+, -, *, / doubleVecdoubleVec
timeinterval operator doubletimeinterval operator double *, /*, / timeintervaltimeinterval
timeinterval operator timeintervaltimeinterval operator timeinterval +, -+, - timeintervaltimeinterval
timeinterval operator timestamptimeinterval operator timestamp + timestamptimestamp
timestamp operator timeintervaltimestamp operator timeinterval + timestamptimestamp
timestamp operator timestamptimestamp operator timestamp - timeintervaltimeinterval
operatordoubleoperatordouble -, !-, ! doubledouble
operatortimeintervaloperatortimeinterval - timeintervaltimeinterval
double operator doubledouble operator double <, <=, ==, >=, >, !=<, <=, ==, >=, >, != doubledouble
string operator stringstring operator string <, <=, ==, >=, >, !=<, <=, ==, >=, >, != doubledouble
timestamp operator timestamptimestamp operator timestamp <, <=, ==, >=, >, !=<, <=, ==, >=, >, != doubledouble
timeinterval operator timeintervaltimeinterval operator timeinterval <, <=, ==, >=, >, !=<, <=, ==, >=, >, != doubledouble
double operator doubledouble operator double &&, ||&&, || doubledouble

使用三元运算符 (double ? statement1 : statement2) 测试双精度值时,非零值为 true,零值为 falseWhen testing a double with a ternary operator (double ? statement1 : statement2), nonzero is true, and zero is false.

函数Functions

可以使用以下预定义函数来定义自动缩放公式。These predefined functions are available for you to use in defining an automatic scaling formula.

函数Function 返回类型Return type 说明Description
avg(doubleVecList)avg(doubleVecList) doubledouble 返回 DoubleVecList 中所有值的平均值。Returns the average value for all values in the doubleVecList.
len(doubleVecList)len(doubleVecList) doubledouble 返回从 doubleVecList 创建的矢量的长度。Returns the length of the vector that is created from the doubleVecList.
lg(double)lg(double) doubledouble 返回 double 的对数底数 2。Returns the log base 2 of the double.
lg(doubleVecList)lg(doubleVecList) doubleVecdoubleVec 返回 doubleVecList 的分量对数底数 2。Returns the component-wise log base 2 of the doubleVecList. 必须为参数显式传递 vec(double)。A vec(double) must be explicitly passed for the parameter. 否则会采用 double lg(double) 版本。Otherwise, the double lg(double) version is assumed.
ln(double)ln(double) doubledouble 返回 double 的自然对数。Returns the natural log of the double.
ln(doubleVecList)ln(doubleVecList) doubleVecdoubleVec 返回 double 的自然对数。Returns the natural log of the double.
log(double)log(double) doubledouble 返回 double 的对数底数 10。Returns the log base 10 of the double.
log(doubleVecList)log(doubleVecList) doubleVecdoubleVec 返回 doubleVecList 的分量对数底数 10。Returns the component-wise log base 10 of the doubleVecList. 对于单一的 double 参数,必须显式传递 vec(double)。A vec(double) must be explicitly passed for the single double parameter. 否则会采用 double log(double) 版本。Otherwise, the double log(double) version is assumed.
max(doubleVecList)max(doubleVecList) doubledouble 返回 doubleVecList 中的最大值。Returns the maximum value in the doubleVecList.
min(doubleVecList)min(doubleVecList) doubledouble 返回 doubleVecList 中的最小值。Returns the minimum value in the doubleVecList.
norm(doubleVecList)norm(doubleVecList) doubledouble 返回从 doubleVecList 创建的矢量的二范数。Returns the two-norm of the vector that is created from the doubleVecList.
percentile(doubleVec v, double p)percentile(doubleVec v, double p) doubledouble 返回矢量 v 的百分位元素。Returns the percentile element of the vector v.
rand()rand() doubledouble 返回介于 0.0 和 1.0 之间的随机值。Returns a random value between 0.0 and 1.0.
range(doubleVecList)range(doubleVecList) doubledouble 返回 doubleVecList 中最小值和最大值之间的差。Returns the difference between the min and max values in the doubleVecList.
std(doubleVecList)std(doubleVecList) doubledouble 返回 doubleVecList 中值的样本标准偏差。Returns the sample standard deviation of the values in the doubleVecList.
stop()stop() 停止对自动缩放表达式求值。Stops evaluation of the autoscaling expression.
sum(doubleVecList)sum(doubleVecList) doubledouble 返回 doubleVecList 的所有组成部分之和。Returns the sum of all the components of the doubleVecList.
time(string dateTime="")time(string dateTime="") timestamptimestamp 如果未传递参数,则返回当前时间的时间戳;如果传递了参数,则返回 dateTime 字符串的时间戳。Returns the time stamp of the current time if no parameters are passed, or the time stamp of the dateTime string if it is passed. 支持的 dateTime 格式为 W3C-DTF 和 RFC 1123。Supported dateTime formats are W3C-DTF and RFC 1123.
val(doubleVec v, double i)val(doubleVec v, double i) doubledouble 返回在起始索引为零的矢量 v 中,位置 i 处的元素的值。Returns the value of the element that is at location i in vector v, with a starting index of zero.

上表中描述的某些函数可以接受列表作为参数。Some of the functions that are described in the previous table can accept a list as an argument. 逗号分隔列表为 doubledoubleVec 的任意组合。The comma-separated list is any combination of double and doubleVec. 例如:For example:

doubleVecList := ( (double | doubleVec)+(, (double | doubleVec) )* )?

doubleVecList 值在计算之前将转换为单个 doubleVec。The doubleVecList value is converted to a single doubleVec before evaluation. 例如,如果 v = [1,2,3],则调用 avg(v) 相当于调用 avg(1,2,3)For example, if v = [1,2,3], then calling avg(v) is equivalent to calling avg(1,2,3). 调用 avg(v, 7) 相当于调用 avg(1,2,3,7)Calling avg(v, 7) is equivalent to calling avg(1,2,3,7).

获取样本数据Obtain sample data

自动缩放公式使用 Batch 服务提供的指标数据(样本)。Autoscale formulas act on metrics data (samples) that is provided by the Batch service. 公式根据服务所提供的值来扩大或缩小池的大小。A formula grows or shrinks pool size based on the values that it obtains from the service. 上述服务定义的变量是可提供各种方法来访问与该对象关联的数据的对象。The service-defined variables that were described previously are objects that provide various methods to access data that is associated with that object. 例如,以下表达式显示了一个用于获取过去五分钟 CPU 使用率的请求:For example, the following expression shows a request to get the last five minutes of CPU usage:

$CPUPercent.GetSample(TimeInterval_Minute * 5)
方法Method 说明Description
GetSample()GetSample() GetSample() 方法返回数据样本的矢量。The GetSample() method returns a vector of data samples.

一个样本最好包含 30 秒钟的指标数据。A sample is 30 seconds worth of metrics data. 换而言之,将每隔 30 秒获取一次样本。In other words, samples are obtained every 30 seconds. 但如下所示,样本在收集后需经历一定的延迟才能供公式使用。But as noted below, there is a delay between when a sample is collected and when it is available to a formula. 因此,并非一段指定时间内的所有样本都可用于公式求值。As such, not all samples for a given time period may be available for evaluation by a formula.
  • doubleVec GetSample(double count)
    指定从已收集的最近样本中获得的样本数。Specifies the number of samples to obtain from the most recent samples that were collected.

    GetSample(1) 返回最后一个可用样本。GetSample(1) returns the last available sample. 但对于像 $CPUPercent 这样的度量值,不应使用此方法,因为无法知道样本是何时收集的。For metrics like $CPUPercent, however, this should not be used because it is impossible to know when the sample was collected. 它可能是最近收集的,也可能由于系统问题而变得很旧。It might be recent, or, because of system issues, it might be much older. 最好使用如下所示的时间间隔。It is better in such cases to use a time interval as shown below.
  • doubleVec GetSample((timestamp or timeinterval) startTime [, double samplePercent])
    指定收集样本数据的时间范围。Specifies a time frame for gathering sample data. (可选)它还指定必须在请求的时间范围内提供的样本的百分比。Optionally, it also specifies the percentage of samples that must be available in the requested time frame.

    如果 CPUPercent 历史记录中存在过去 10 分钟的所有样本,$CPUPercent.GetSample(TimeInterval_Minute * 10) 将返回 20 个样本。$CPUPercent.GetSample(TimeInterval_Minute * 10) would return 20 samples if all samples for the last 10 minutes are present in the CPUPercent history. 但如果最后一分钟的历史记录不可用,则只返回 18 个样本。If the last minute of history was not available, however, only 18 samples would be returned. 在这种情况下:In this case:

    $CPUPercent.GetSample(TimeInterval_Minute * 10, 95) 会失败,因为仅 90% 的样本可用。$CPUPercent.GetSample(TimeInterval_Minute * 10, 95) would fail because only 90 percent of the samples are available.

    $CPUPercent.GetSample(TimeInterval_Minute * 10, 80) 将成功。$CPUPercent.GetSample(TimeInterval_Minute * 10, 80) would succeed.
  • doubleVec GetSample((timestamp or timeinterval) startTime, (timestamp or timeinterval) endTime [, double samplePercent])
    指定收集数据的时间范围(包括开始时间和结束时间)。Specifies a time frame for gathering data, with both a start time and an end time.

    如前所述,每收集一个样本后并且该样本可供公式使用时,会存在一定的延迟。As mentioned above, there is a delay between when a sample is collected and when it is available to a formula. 使用 GetSample 方法时,请考虑到这种延迟。Consider this delay when you use the GetSample method. 请参阅下面的 GetSamplePercentSee GetSamplePercent below.
GetSamplePeriod()GetSamplePeriod() 返回在历史样本数据集中采样的期间。Returns the period of samples that were taken in a historical sample data set.
Count()Count() 返回指标历史记录中的样本总数。Returns the total number of samples in the metric history.
HistoryBeginTime()HistoryBeginTime() 返回指标最旧可用数据样本的时间戳。Returns the time stamp of the oldest available data sample for the metric.
GetSamplePercent()GetSamplePercent() 返回给定时间间隔的可用样本百分比。Returns the percentage of samples that are available for a given time interval. 例如:For example:

doubleVec GetSamplePercent( (timestamp or timeinterval) startTime [, (timestamp or timeinterval) endTime] )

由于 GetSample 方法失败,因此如果返回的样本百分比低于指定的 samplePercent,则可先使用 GetSamplePercent 方法进行检查。Because the GetSample method fails if the percentage of samples returned is less than the samplePercent specified, you can use the GetSamplePercent method to check first. 然后,如果存在的样本数量不足,可以执行其他操作,无需停止自动缩放评估。Then you can perform an alternate action if insufficient samples are present, without halting the automatic scaling evaluation.

样本、样本百分比和 GetSample() 方法Samples, sample percentage, and the GetSample() method

自动缩放公式的核心操作是获取任务和资源度量值数据,并根据该数据调整池大小。The core operation of an autoscale formula is to obtain task and resource metric data and then adjust pool size based on that data. 因此,请务必明确知道自动缩放公式如何与指标数据(样本)交互。As such, it is important to have a clear understanding of how autoscale formulas interact with metrics data (samples).

示例Samples

Batch 服务定期获取任务和资源指标的样本,使其可供自动缩放公式使用。The Batch service periodically takes samples of task and resource metrics and makes them available to your autoscale formulas. 这些样本每 30 秒由 Batch 服务记录一次。These samples are recorded every 30 seconds by the Batch service. 但是,记录样本的时间与样本可供自动缩放公式使用(与读取)的时间之间通常存在延迟。However, there is typically a delay between when those samples were recorded and when they are made available to (and can be read by) your autoscale formulas. 此外,由于各种因素(例如网络或其他基础结构问题),可能无法记录特定间隔的样本。Additionally, due to various factors such as network or other infrastructure issues, samples may not be recorded for a particular interval.

样本百分比Sample percentage

samplePercent 传递到 GetSample() 方法,或调用 GetSamplePercent() 方法时,“百分比”是指 Batch 服务记录的样本可能的总数与自动缩放公式可用的样本数之间的比值。When samplePercent is passed to the GetSample() method or the GetSamplePercent() method is called, percent refers to a comparison between the total possible number of samples that are recorded by the Batch service and the number of samples that are available to your autoscale formula.

让我们以 10 分钟的时间跨度为例。Let's look at a 10-minute timespan as an example. 由于每隔 30 秒记录样本一次,因此在 10 分钟的时间跨度内,Batch 服务所记录的样本总数将达到 20 个(每分钟 2 个)。Because samples are recorded every 30 seconds within a 10-minute timespan, the maximum total number of samples that are recorded by Batch would be 20 samples (2 per minute). 但是,由于报告机制固有的延迟,并且 Azure 中存在其他问题,可能只有 15 个样本可供自动缩放公式读取。However, due to the inherent latency of the reporting mechanism and other issues within Azure, there may be only 15 samples that are available to your autoscale formula for reading. 因此,举例来说,在这 10 分钟内,记录的样本总数只有 75% 可供公式使用。So, for example, for that 10-minute period, only 75% of the total number of samples recorded may be available to your formula.

GetSample() 和样本范围GetSample() and sample ranges

自动缩放公式将对池进行扩大和缩小操作 — 添加节点或删除节点。Your autoscale formulas are going to be growing and shrinking your pools — adding nodes or removing nodes. 由于节点耗费资金,需要确保公式所使用的智能分析方法采用了足够的数据。Because nodes cost you money, you want to ensure that your formulas use an intelligent method of analysis that is based on sufficient data. 因此,建议在公式中使用趋势类型的分析。Therefore, we recommend that you use a trending-type analysis in your formulas. 此类型会根据所收集样本的范围来扩大和缩小池。This type grows and shrinks your pools based on a range of collected samples.

为此,请使用 GetSample(interval look-back start, interval look-back end) 返回样本的矢量:To do so, use GetSample(interval look-back start, interval look-back end) to return a vector of samples:

$runningTasksSample = $RunningTasks.GetSample(1 * TimeInterval_Minute, 6 * TimeInterval_Minute);

Batch 评估上述代码行后,会以值的矢量形式返回样本范围。When the above line is evaluated by Batch, it returns a range of samples as a vector of values. 例如:For example:

$runningTasksSample=[1,1,1,1,1,1,1,1,1,1];

收集样本矢量后,便可使用 min()max()avg() 等函数从所收集的范围派生有意义的值。Once you've collected the vector of samples, you can then use functions like min(), max(), and avg() to derive meaningful values from the collected range.

为了提高安全性,如果特定时间段小于特定的样本百分比,可强制公式求值失败。For additional security, you can force a formula evaluation to fail if less than a certain sample percentage is available for a particular time period. 强制公式求值失败会指示 Batch 在无法提供指定百分比的样本数时停止进一步的公式求值。When you force a formula evaluation to fail, you instruct Batch to cease further evaluation of the formula if the specified percentage of samples is not available. 在这种情况下,不更改池大小。In this case, no change is made to the pool size. 要指定求值成功所需的样本百分比,请将其指定为 GetSample() 的第三个参数。To specify a required percentage of samples for the evaluation to succeed, specify it as the third parameter to GetSample(). 下面指定要求 75% 的样本:Here, a requirement of 75 percent of samples is specified:

$runningTasksSample = $RunningTasks.GetSample(60 * TimeInterval_Second, 120 * TimeInterval_Second, 75);

由于样本可用性可能存在延迟,因此请务必始终指定回查开始时间早于一分钟的时间范围。Because there may be a delay in sample availability, it is important to always specify a time range with a look-back start time that is older than one minute. 样本需要花大约一分钟的时间才能传播到整个系统,因此可能无法使用 (0 * TimeInterval_Second, 60 * TimeInterval_Second) 范围内的样本。It takes approximately one minute for samples to propagate through the system, so samples in the range (0 * TimeInterval_Second, 60 * TimeInterval_Second) may not be available. 同样地,可以使用 GetSample() 百分比参数来强制实施特定样本百分比要求。Again, you can use the percentage parameter of GetSample() to force a particular sample percentage requirement.

重要

强烈建议 不要仅依赖自动缩放公式中的 GetSample(1)We strongly recommend that you avoid relying only on GetSample(1) in your autoscale formulas. 这是因为,GetSample(1) 基本上只是向 Batch 服务表明:“不论多久以前检索最后一个样本,请将它提供给我。”This is because GetSample(1) essentially says to the Batch service, "Give me the last sample you have, no matter how long ago you retrieved it." 由于它只是单个样本,而且可能是较旧的样本,因此可能无法代表最近任务或资源状态的全貌。Since it is only a single sample, and it may be an older sample, it may not be representative of the larger picture of recent task or resource state. 如果使用 GetSample(1),请确保它是更大语句的一部分,而不是公式所依赖的唯一数据点。If you do use GetSample(1), make sure that it's part of a larger statement and not the only data point that your formula relies on.

度量值Metrics

在定义公式时,可以同时使用资源和任务指标。You can use both resource and task metrics when you're defining a formula. 可根据获取和求值的指标数据对池中专用节点的目标数进行调整。You adjust the target number of dedicated nodes in the pool based on the metrics data that you obtain and evaluate. 有关每个指标的详细信息,请参见上面的变量部分。See the Variables section above for more information on each metric.

指标Metric 说明Description
资源Resource

资源指标基于计算节点的 CPU、带宽和内存使用量以及节点数。Resource metrics are based on the CPU, the bandwidth, the memory usage of compute nodes, and the number of nodes.

这些服务定义的变量可用于根据节点计数进行调整:These service-defined variables are useful for making adjustments based on node count:

  • $TargetDedicatedNodes$TargetDedicatedNodes
  • $TargetLowPriorityNodes$TargetLowPriorityNodes
  • $CurrentDedicatedNodes$CurrentDedicatedNodes
  • $CurrentLowPriorityNodes$CurrentLowPriorityNodes
  • $PreemptedNodeCount$PreemptedNodeCount
  • $SampleNodeCount$SampleNodeCount

这些服务定义的变量可用于根据节点资源使用量进行调整:These service-defined variables are useful for making adjustments based on node resource usage:

  • $CPUPercent$CPUPercent
  • $WallClockSeconds$WallClockSeconds
  • $MemoryBytes$MemoryBytes
  • $DiskBytes$DiskBytes
  • $DiskReadBytes$DiskReadBytes
  • $DiskWriteBytes$DiskWriteBytes
  • $DiskReadOps$DiskReadOps
  • $DiskWriteOps$DiskWriteOps
  • $NetworkInBytes$NetworkInBytes
  • $NetworkInBytes$NetworkOutBytes

任务Task

任务指标基于任务的状态(例如活动、挂起和已完成)。Task metrics are based on the status of tasks, such as Active, Pending, and Completed. 以下服务定义的变量可用于根据任务指标调整池大小:The following service-defined variables are useful for making pool-size adjustments based on task metrics:

  • $ActiveTasks$ActiveTasks
  • $RunningTasks$RunningTasks
  • $PendingTasks$PendingTasks
  • $SucceededTasks$SucceededTasks
  • $FailedTasks$FailedTasks

编写自动缩放公式Write an autoscale formula

构建自动缩放公式时,可以使用上述组件来生成语句,然后将这些语句组合成完整的公式即可。You build an autoscale formula by forming statements that use the above components, then combine those statements into a complete formula. 本部分将创建一个示例自动缩放公式,它可以执行一些实际缩放决策。In this section, we create an example autoscale formula that can perform some real-world scaling decisions.

首先,定义新自动缩放公式的要求。First, let's define the requirements for our new autoscale formula. 该公式应:The formula should:

  1. 如果 CPU 使用率高,则增加池中专用计算节点的目标数。Increase the target number of dedicated compute nodes in a pool if CPU usage is high.
  2. 如果 CPU 使用率低,则减少池中专用计算节点的目标数。Decrease the target number of dedicated compute nodes in a pool when CPU usage is low.
  3. 始终将最大专用节点数限制为 400。Always restrict the maximum number of dedicated nodes to 400.
  4. 减少节点数量时,请不要删除正在运行任务的节点;如有必要,请等待任务完成删除节点。When reducing the number of nodes, do not remove nodes that are running tasks; if necessary, wait until tasks have finished to remove nodes.

若要在 CPU 使用率高时增加节点数,可定义一个语句,仅当过去 10 分钟内的最小平均 CPU 使用率高于 70% 时,该语句才会向用户定义变量 ($totalDedicatedNodes) 填充一个值,值的大小为专用节点当前目标数的 110%。To increase the number of nodes during high CPU usage, define the statement that populates a user-defined variable ($totalDedicatedNodes) with a value that is 110 percent of the current target number of dedicated nodes, but only if the minimum average CPU usage during the last 10 minutes was above 70 percent. 否则,使用当前专用节点数的值。Otherwise, use the value for the current number of dedicated nodes.

$totalDedicatedNodes =
    (min($CPUPercent.GetSample(TimeInterval_Minute * 10)) > 0.7) ?
    ($CurrentDedicatedNodes * 1.1) : $CurrentDedicatedNodes;

为了在 CPU 使用率低时减少专用节点数,如果过去 60 分钟的平均 CPU 使用率低于 20%,则公式中的下一个语句会将同一 变量设置为专用节点当前目标数的 90%。 $totalDedicatedNodesTo decrease the number of dedicated nodes during low CPU usage, the next statement in our formula sets the same $totalDedicatedNodes variable to 90 percent of the current target number of dedicated nodes if the average CPU usage in the past 60 minutes was under 20 percent. 否则,使用在以上语句中填充的 $totalDedicatedNodes 的当前值。Otherwise, use the current value of $totalDedicatedNodes that we populated in the statement above.

$totalDedicatedNodes =
    (avg($CPUPercent.GetSample(TimeInterval_Minute * 60)) < 0.2) ?
    ($CurrentDedicatedNodes * 0.9) : $totalDedicatedNodes;

现在,将专用计算节点的目标数限制为最大值 400:Now limit the target number of dedicated compute nodes to a maximum of 400:

$TargetDedicatedNodes = min(400, $totalDedicatedNodes)

下面是完整公式:Here's the complete formula:

$totalDedicatedNodes =
    (min($CPUPercent.GetSample(TimeInterval_Minute * 10)) > 0.7) ?
    ($CurrentDedicatedNodes * 1.1) : $CurrentDedicatedNodes;
$totalDedicatedNodes =
    (avg($CPUPercent.GetSample(TimeInterval_Minute * 60)) < 0.2) ?
    ($CurrentDedicatedNodes * 0.9) : $totalDedicatedNodes;
$TargetDedicatedNodes = min(400, $totalDedicatedNodes)

使用 Batch Sdk 创建启用自动缩放的池Create an autoscale-enabled pool with Batch SDKs

可以使用任意批处理 sdk批处理 REST API BATCH PowerShell cmdletbatch CLI来配置池自动缩放。Pool autoscaling can be configured using any of the Batch SDKs, the Batch REST API Batch PowerShell cmdlets, and the Batch CLI. 在本部分中,可以看到 .NET 和 Python 的示例。In this section, you can see examples for both .NET and Python.

.NET.NET

若要在 .NET 中创建启用自动缩放的池,请遵循以下步骤:To create a pool with autoscaling enabled in .NET, follow these steps:

  1. 使用 BatchClient.PoolOperations.CreatePool 创建池。Create the pool with BatchClient.PoolOperations.CreatePool.
  2. CloudPool.AutoScaleEnabled 属性设置为 trueSet the CloudPool.AutoScaleEnabled property to true.
  3. 使用自动缩放公式设置 CloudPool.AutoScaleFormula 属性。Set the CloudPool.AutoScaleFormula property with your autoscale formula.
  4. (可选)设置 CloudPool.AutoScaleEvaluationInterval 属性(默认值为 15 分钟)。(Optional) Set the CloudPool.AutoScaleEvaluationInterval property (default is 15 minutes).
  5. 使用 CloudPool.CommitCommitAsync 提交池。Commit the pool with CloudPool.Commit or CommitAsync.

以下代码片段在 .NET 中创建启用自动缩放的池。The following code snippet creates an autoscale-enabled pool in .NET. 该池的自动缩放公式在星期一将专用节点的目标数设置为 5,在其他星期日期将该目标数设置为 1。The pool's autoscale formula sets the target number of dedicated nodes to 5 on Mondays, and 1 on every other day of the week. 自动缩放间隔设置为 30 分钟。The automatic scaling interval is set to 30 minutes. 在本文和本文的C#其他代码片段中,myBatchClient 是正确初始化的BatchClient类的实例。In this and the other C# snippets in this article, myBatchClient is a properly initialized instance of the BatchClient class.

CloudPool pool = myBatchClient.PoolOperations.CreatePool(
                    poolId: "mypool",
                    virtualMachineSize: "standard_d1_v2",
                    cloudServiceConfiguration: new CloudServiceConfiguration(osFamily: "5"));    
pool.AutoScaleEnabled = true;
pool.AutoScaleFormula = "$TargetDedicatedNodes = (time().weekday == 1 ? 5:1);";
pool.AutoScaleEvaluationInterval = TimeSpan.FromMinutes(30);
await pool.CommitAsync();

重要

创建启用自动缩放的池时,请不要在 CreatePool 调用中指定 targetDedicatedNodes 参数或 targetLowPriorityNodes 参数。When you create an autoscale-enabled pool, do not specify the targetDedicatedNodes parameter or the targetLowPriorityNodes parameter on the call to CreatePool. 应该指定池中的 AutoScaleEnabledAutoScaleFormula 属性。Instead, specify the AutoScaleEnabled and AutoScaleFormula properties on the pool. 这些属性的值确定每种类型的节点的目标数。The values for these properties determine the target number of each type of node. 此外,若要手动调整启用了自动缩放的池的大小(例如,使用BatchClient. PoolOperations. batchclient.pooloperations.resizepoolasync 来调整),请先禁用池上的自动缩放,然后调整其大小。Also, to manually resize an autoscale-enabled pool (for example, with BatchClient.PoolOperations.ResizePoolAsync), first disable automatic scaling on the pool, then resize it.

自动缩放间隔Automatic scaling interval

默认情况下,Batch 服务根据其自动缩放公式每隔 15 分钟调整池大小。By default, the Batch service adjusts a pool's size according to its autoscale formula every 15 minutes. 可使用以下池属性配置此间隔:This interval is configurable by using the following pool properties:

最小间隔为 5 分钟,最大间隔为 168 小时。The minimum interval is five minutes, and the maximum is 168 hours. 如果指定的间隔超出此范围,Batch 服务将返回“错误的请求(400)”错误。If an interval outside this range is specified, the Batch service returns a Bad Request (400) error.

备注

自动缩放目前不能以低于一分钟的时间响应更改,而只能在运行工作负荷时逐步调整池大小。Autoscaling is not currently intended to respond to changes in less than a minute, but rather is intended to adjust the size of your pool gradually as you run a workload.

PythonPython

同样,可以使用 Python SDK 创建启用自动缩放的池,方法是:Similarly, you can make an autoscale-enabled pool with the Python SDK by:

  1. 创建池并指定其配置。Create a pool and specify its configuration.
  2. 将池添加到服务客户端。Add the pool to the service client.
  3. 使用你编写的公式在池中启用自动缩放。Enable autoscale on the pool with a formula you write.
# Create a pool; specify configuration
new_pool = batch.models.PoolAddParameter(
    id="autoscale-enabled-pool",
    virtual_machine_configuration=batchmodels.VirtualMachineConfiguration(
        image_reference=batchmodels.ImageReference(
          publisher="Canonical",
          offer="UbuntuServer",
          sku="18.04-LTS",
          version="latest"
            ),
        node_agent_sku_id="batch.node.ubuntu 18.04"),
    vm_size="STANDARD_D1_v2",
    target_dedicated_nodes=0,
    target_low_priority_nodes=0
)
batch_service_client.pool.add(new_pool) # Add the pool to the service client

formula = """$curTime = time();
             $workHours = $curTime.hour >= 8 && $curTime.hour < 18; 
             $isWeekday = $curTime.weekday >= 1 && $curTime.weekday <= 5; 
             $isWorkingWeekdayHour = $workHours && $isWeekday; 
             $TargetDedicated = $isWorkingWeekdayHour ? 20:10;""";

# Enable autoscale; specify the formula
response = batch_service_client.pool.enable_auto_scale(pool_id, auto_scale_formula=formula,
                                            auto_scale_evaluation_interval=datetime.timedelta(minutes=10), 
                                            pool_enable_auto_scale_options=None, 
                                            custom_headers=None, raw=False)

提示

有关使用 Python SDK 的更多示例,请参阅 GitHub 上的Batch Python 快速入门库More examples of using the Python SDK can be found in the Batch Python Quickstart repository on GitHub.

启用现有池的自动缩放Enable autoscaling on an existing pool

每个 Batch SDK 都提供了启用自动缩放的方式。Each Batch SDK provides a way to enable autoscaling. 例如:For example:

启用现有池的自动缩放时,请注意以下要点:When you enable autoscaling on an existing pool, keep in mind the following points:

  • 发出启用自动缩放的请求时,如果池中的自动缩放已禁用,则必须在发出请求时指定有效的自动缩放公式。If automatic scaling is currently disabled on the pool when you issue the request to enable autoscaling, you must specify a valid autoscale formula when you issue the request. 可以选择性地指定自动缩放评估间隔。You can optionally specify an autoscale evaluation interval. 如果不指定间隔,则使用默认值 15 分钟。If you do not specify an interval, the default value of 15 minutes is used.

  • 如果池中的自动缩放目前已启用,则可指定自动缩放公式和/或评估间隔。If autoscale is currently enabled on the pool, you can specify an autoscale formula, an evaluation interval, or both. 必须至少指定其中的一个属性。You must specify at least one of these properties.

    • 如果指定新的自动缩放评估时间间隔,将停止现有评估计划并开始新的计划。If you specify a new autoscale evaluation interval, then the existing evaluation schedule is stopped and a new schedule is started. 新计划的开始时间是发出启用自动缩放的请求的时间。The new schedule's start time is the time at which the request to enable autoscaling was issued.
    • 如果省略自动缩放公式或评估时间间隔,则 Batch 服务将继续使用该设置的当前值。If you omit either the autoscale formula or evaluation interval, the Batch service continues to use the current value of that setting.

备注

如果在 .NET 中创建池时指定了 CreatePool 方法的 targetDedicatedNodestargetLowPriorityNodes 参数的值,或者在其他语言中指定了相应参数的值,则评估自动缩放公式时将忽略这些值。If you specified values for the targetDedicatedNodes or targetLowPriorityNodes parameters of the CreatePool method when you created the pool in .NET, or for the comparable parameters in another language, then those values are ignored when the automatic scaling formula is evaluated.

此C#代码片段使用Batch .net库启用现有池的自动缩放:This C# code snippet uses the Batch .NET library to enable autoscaling on an existing pool:

// Define the autoscaling formula. This formula sets the target number of nodes
// to 5 on Mondays, and 1 on every other day of the week
string myAutoScaleFormula = "$TargetDedicatedNodes = (time().weekday == 1 ? 5:1);";

// Set the autoscale formula on the existing pool
await myBatchClient.PoolOperations.EnableAutoScaleAsync(
    "myexistingpool",
    autoscaleFormula: myAutoScaleFormula);

更新自动缩放公式Update an autoscale formula

若要更新现有的已启用自动缩放的池的公式,请使用新公式再次调用启用自动缩放的操作。To update the formula on an existing autoscale-enabled pool, call the operation to enable autoscaling again with the new formula. 例如,如果在执行以下 .NET 代码时已在 myexistingpool 上启用自动缩放,则自动缩放公式将替换为 myNewFormula 的内容。For example, if autoscaling is already enabled on myexistingpool when the following .NET code is executed, its autoscale formula is replaced with the contents of myNewFormula.

await myBatchClient.PoolOperations.EnableAutoScaleAsync(
    "myexistingpool",
    autoscaleFormula: myNewFormula);

更新自动缩放间隔Update the autoscale interval

若要更新现有的已启用自动缩放的池的自动缩放评估间隔,请使用新间隔再次调用启用自动缩放的操作。To update the autoscale evaluation interval of an existing autoscale-enabled pool, call the operation to enable autoscaling again with the new interval. 例如,将 .NET 中已启用自动缩放的池的自动缩放评估间隔设置为 60 分钟:For example, to set the autoscale evaluation interval to 60 minutes for a pool that's already autoscale-enabled in .NET:

await myBatchClient.PoolOperations.EnableAutoScaleAsync(
    "myexistingpool",
    autoscaleEvaluationInterval: TimeSpan.FromMinutes(60));

评估自动缩放公式Evaluate an autoscale formula

可以在将公式应用于池之前对其进行评估。You can evaluate a formula before applying it to a pool. 这样,便可以测试公式,以便在将公式放入生产之前查看其语句的评估方式。In this way, you can test the formula to see how its statements evaluate before you put the formula into production.

若要评估自动缩放公式,必须先通过有效的公式对池启用自动缩放。To evaluate an autoscale formula, you must first enable autoscaling on the pool with a valid formula. 若要在尚未启用自动缩放的池上测试公式,请在首次启用自动缩放时使用单行公式 $TargetDedicatedNodes = 0To test a formula on a pool that doesn't yet have autoscaling enabled, use the one-line formula $TargetDedicatedNodes = 0 when you first enable autoscaling. 然后使用以下方法之一来评估要测试的公式:Then, use one of the following to evaluate the formula you want to test:

在此Batch .net代码片段中,我们将评估自动缩放公式。In this Batch .NET code snippet, we evaluate an autoscale formula. 如果池未启用自动缩放,先启用自动缩放。If the pool does not have autoscaling enabled, we enable it first.

// First obtain a reference to an existing pool
CloudPool pool = await batchClient.PoolOperations.GetPoolAsync("myExistingPool");

// If autoscaling isn't already enabled on the pool, enable it.
// You can't evaluate an autoscale formula on non-autoscale-enabled pool.
if (pool.AutoScaleEnabled == false)
{
    // We need a valid autoscale formula to enable autoscaling on the
    // pool. This formula is valid, but won't resize the pool:
    await pool.EnableAutoScaleAsync(
        autoscaleFormula: "$TargetDedicatedNodes = $CurrentDedicatedNodes;",
        autoscaleEvaluationInterval: TimeSpan.FromMinutes(5));

    // Batch limits EnableAutoScaleAsync calls to once every 30 seconds.
    // Because we want to apply our new autoscale formula below if it
    // evaluates successfully, and we *just* enabled autoscaling on
    // this pool, we pause here to ensure we pass that threshold.
    Thread.Sleep(TimeSpan.FromSeconds(31));

    // Refresh the properties of the pool so that we've got the
    // latest value for AutoScaleEnabled
    await pool.RefreshAsync();
}

// We must ensure that autoscaling is enabled on the pool prior to
// evaluating a formula
if (pool.AutoScaleEnabled == true)
{
    // The formula to evaluate - adjusts target number of nodes based on
    // day of week and time of day
    string myFormula = @"
        $curTime = time();
        $workHours = $curTime.hour >= 8 && $curTime.hour < 18;
        $isWeekday = $curTime.weekday >= 1 && $curTime.weekday <= 5;
        $isWorkingWeekdayHour = $workHours && $isWeekday;
        $TargetDedicatedNodes = $isWorkingWeekdayHour ? 20:10;
    ";

    // Perform the autoscale formula evaluation. Note that this code does not
    // actually apply the formula to the pool.
    AutoScaleRun eval =
        await batchClient.PoolOperations.EvaluateAutoScaleAsync(pool.Id, myFormula);

    if (eval.Error == null)
    {
        // Evaluation success - print the results of the AutoScaleRun.
        // This will display the values of each variable as evaluated by the
        // autoscale formula.
        Console.WriteLine("AutoScaleRun.Results: " +
            eval.Results.Replace("$", "\n    $"));

        // Apply the formula to the pool since it evaluated successfully
        await batchClient.PoolOperations.EnableAutoScaleAsync(pool.Id, myFormula);
    }
    else
    {
        // Evaluation failed, output the message associated with the error
        Console.WriteLine("AutoScaleRun.Error.Message: " +
            eval.Error.Message);
    }
}

如果此代码片段中所示的公式评估成功,将生成如下所示的结果:Successful evaluation of the formula shown in this code snippet produces results similar to:

AutoScaleRun.Results:
    $TargetDedicatedNodes=10;
    $NodeDeallocationOption=requeue;
    $curTime=2016-10-13T19:18:47.805Z;
    $isWeekday=1;
    $isWorkingWeekdayHour=0;
    $workHours=0

获取有关自动缩放运行的信息Get information about autoscale runs

为确保公式按预期执行,建议定期检查 Batch 在池上执行的自动缩放运行的结果。To ensure that your formula is performing as expected, we recommend that you periodically check the results of the autoscaling runs that Batch performs on your pool. 为此,获取(或刷新)对池的引用,并检查上一次自动缩放运行的属性。To do so, get (or refresh) a reference to the pool, and examine the properties of its last autoscale run.

在 Batch .NET 中,CloudPool.AutoScaleRun 属性具有多个属性,其提供了有关在池上执行的最新自动缩放运行的信息:In Batch .NET, the CloudPool.AutoScaleRun property has several properties that provide information about the latest automatic scaling run performed on the pool:

在 REST API 中,获取有关池的信息请求返回有关池的信息,其中包括 autoScaleRun 属性中最新自动缩放运行的信息。In the REST API, the Get information about a pool request returns information about the pool, which includes the latest automatic scaling run information in the autoScaleRun property.

以下 C# 代码片段使用 Batch .NET 库来打印有关池 myPool 上的最新自动缩放运行的信息:The following C# code snippet uses the Batch .NET library to print information about the last autoscaling run on pool myPool:

await Cloud pool = myBatchClient.PoolOperations.GetPoolAsync("myPool");
Console.WriteLine("Last execution: " + pool.AutoScaleRun.Timestamp);
Console.WriteLine("Result:" + pool.AutoScaleRun.Results.Replace("$", "\n  $"));
Console.WriteLine("Error: " + pool.AutoScaleRun.Error);

以上代码段的示例输出:Sample output of the preceding snippet:

Last execution: 10/14/2016 18:36:43
Result:
  $TargetDedicatedNodes=10;
  $NodeDeallocationOption=requeue;
  $curTime=2016-10-14T18:36:43.282Z;
  $isWeekday=1;
  $isWorkingWeekdayHour=0;
  $workHours=0
Error:

示例自动缩放公式Example autoscale formulas

让我们查看一些公式,它们演示了调整池中计算资源数量的不同方法。Let's look at a few formulas that show different ways to adjust the amount of compute resources in a pool.

示例 1:基于时间的调整Example 1: Time-based adjustment

假设你想要根据星期日期和当天的时间来调整池的大小。Suppose you want to adjust the pool size based on the day of the week and time of day. 此示例演示如何相应地增加或减少池中的节点数。This example shows how to increase or decrease the number of nodes in the pool accordingly.

该公式首先获取当前时间。The formula first obtains the current time. 如果日期是工作日(周一到周五)且时间是工作时间(早 8 点到晚 6 点),则会将目标池大小设置为 20 个节点。If it's a weekday (1-5) and within working hours (8 AM to 6 PM), the target pool size is set to 20 nodes. 否则,它将设置为 10 个节点。Otherwise, it's set to 10 nodes.

$curTime = time();
$workHours = $curTime.hour >= 8 && $curTime.hour < 18;
$isWeekday = $curTime.weekday >= 1 && $curTime.weekday <= 5;
$isWorkingWeekdayHour = $workHours && $isWeekday;
$TargetDedicatedNodes = $isWorkingWeekdayHour ? 20:10;
$NodeDeallocationOption = taskcompletion;

示例 2:基于任务的调整Example 2: Task-based adjustment

在此示例中,池大小是根据队列中的任务数来调整的。In this example, the pool size is adjusted based on the number of tasks in the queue. 在公式字符串中,注释和分行符都是可以接受的。Both comments and line breaks are acceptable in formula strings.

// Get pending tasks for the past 15 minutes.
$samples = $PendingTasks.GetSamplePercent(TimeInterval_Minute * 15);
// If we have fewer than 70 percent data points, we use the last sample point,
// otherwise we use the maximum of last sample point and the history average.
$tasks = $samples < 70 ? max(0,$PendingTasks.GetSample(1)) : max( $PendingTasks.GetSample(1), avg($PendingTasks.GetSample(TimeInterval_Minute * 15)));
// If number of pending tasks is not 0, set targetVM to pending tasks, otherwise
// half of current dedicated.
$targetVMs = $tasks > 0? $tasks:max(0, $TargetDedicatedNodes/2);
// The pool size is capped at 20, if target VM value is more than that, set it
// to 20. This value should be adjusted according to your use case.
$TargetDedicatedNodes = max(0, min($targetVMs, 20));
// Set node deallocation mode - let running tasks finish before removing a node
$NodeDeallocationOption = taskcompletion;

示例 3:考虑并行任务Example 3: Accounting for parallel tasks

此示例根据任务数调整池大小。This example adjusts the pool size based on the number of tasks. 此公式还考虑为池设置的cloudpool.maxtaskspercomputenode值。This formula also takes into account the MaxTasksPerComputeNode value that has been set for the pool. 在对池启用了并行任务执行的情况下,此方法特别有效。This approach is useful in situations where parallel task execution has been enabled on your pool.

// Determine whether 70 percent of the samples have been recorded in the past
// 15 minutes; if not, use last sample
$samples = $ActiveTasks.GetSamplePercent(TimeInterval_Minute * 15);
$tasks = $samples < 70 ? max(0,$ActiveTasks.GetSample(1)) : max( $ActiveTasks.GetSample(1),avg($ActiveTasks.GetSample(TimeInterval_Minute * 15)));
// Set the number of nodes to add to one-fourth the number of active tasks (the
// MaxTasksPerComputeNode property on this pool is set to 4, adjust this number
// for your use case)
$cores = $TargetDedicatedNodes * 4;
$extraVMs = (($tasks - $cores) + 3) / 4;
$targetVMs = ($TargetDedicatedNodes + $extraVMs);
// Attempt to grow the number of compute nodes to match the number of active
// tasks, with a maximum of 3
$TargetDedicatedNodes = max(0,min($targetVMs,3));
// Keep the nodes active until the tasks finish
$NodeDeallocationOption = taskcompletion;

示例 4:设置初始池大小Example 4: Setting an initial pool size

此示例演示了一个 C# 代码片段,其中的自动缩放公式可在初始时间段内将池大小设置为指定的节点数。This example shows a C# code snippet with an autoscale formula that sets the pool size to a specified number of nodes for an initial time period. 然后,在初始时间段过后,该公式会根据正在运行和处于活动状态的任务的数目调整池大小。Then it adjusts the pool size based on the number of running and active tasks after the initial time period has elapsed.

以下代码片段中的公式:The formula in the following code snippet:

  • 将初始池大小设置为 4 个节点。Sets the initial pool size to four nodes.
  • 在池生命周期的最初 10 分钟内不调整池大小。Does not adjust the pool size within the first 10 minutes of the pool's lifecycle.
  • 10 分钟后,获取过去 60 分钟内正在运行和处于活动状态的任务数目的最大值。After 10 minutes, obtains the max value of the number of running and active tasks within the past 60 minutes.
    • 如果这两个值均为 0(表示过去 60 分钟没有正在运行或处于活动状态的任务),则池大小将设置为 0。If both values are 0 (indicating that no tasks were running or active in the last 60 minutes), the pool size is set to 0.
    • 如果其中一个值大于零,则不进行任何更改。If either value is greater than zero, no change is made.
string now = DateTime.UtcNow.ToString("r");
string formula = string.Format(@"
    $TargetDedicatedNodes = {1};
    lifespan         = time() - time(""{0}"");
    span             = TimeInterval_Minute * 60;
    startup          = TimeInterval_Minute * 10;
    ratio            = 50;

    $TargetDedicatedNodes = (lifespan > startup ? (max($RunningTasks.GetSample(span, ratio), $ActiveTasks.GetSample(span, ratio)) == 0 ? 0 : $TargetDedicatedNodes) : {1});
    ", now, 4);

后续步骤Next steps