您现在访问的是微软AZURE全球版技术文档网站,若需要访问由世纪互联运营的MICROSOFT AZURE中国区技术文档网站,请访问 https://docs.azure.cn.

以并发方式运行任务以最大程度地利用 Batch 计算节点Run tasks concurrently to maximize usage of Batch compute nodes

通过在 Azure Batch 池中的每个计算节点上同时运行多个任务,可在池中的较少节点上最大程度利用资源。By running more than one task simultaneously on each compute node in your Azure Batch pool, you can maximize resource usage on a smaller number of nodes in the pool. 对于某些工作负荷,这可以缩短作业时间并降低成本。For some workloads, this can result in shorter job times and lower cost.

尽管在某些情况下,将一个节点的所有资源专用于单个任务会更有利,但在一些情况下,最好是让多个任务共享这些资源:While some scenarios benefit from dedicating all of a node's resources to a single task, several situations benefit from allowing multiple tasks to share those resources:

  • 尽量减少数据传输:适用于任务可以共享数据的情况。Minimizing data transfer when tasks are able to share data. 在此方案中,将共享数据复制到较小数目的节点并在每个节点上并行执行任务可以大大减少数据传输费用,In this scenario, you can dramatically reduce data transfer charges by copying shared data to a smaller number of nodes and executing tasks in parallel on each node. 尤其是在复制到每个节点的数据必须跨地理区域传输的情况下。This especially applies if the data to be copied to each node must be transferred between geographic regions.
  • 尽量增加内存使用:适用于任务需要大量的内存,但这种需要仅在执行过程中短时出现且时间不固定的情况。Maximizing memory usage when tasks require a large amount of memory, but only during short periods of time, and at variable times during execution. 可以减少计算节点的数量但增加其大小,同时提供更多的内存,以便有效地应对此类高峰负载。You can employ fewer, but larger, compute nodes with more memory to efficiently handle such spikes. 这些节点会在每个节点上并行运行多个任务,而每个任务都会充分利用节点在不同时间的大量内存。These nodes would have multiple tasks running in parallel on each node, but each task would take advantage of the nodes' plentiful memory at different times.
  • 减少节点数目限制:适用于需要在池中进行节点间通信的情况。Mitigating node number limits when inter-node communication is required within a pool. 目前,经过配置可以进行节点间通信的池仅限 50 个计算节点。Currently, pools configured for inter-node communication are limited to 50 compute nodes. 如果此类池中的每个节点都可以并行执行任务,则可同时执行较大数量的任务。If each node in such a pool is able to execute tasks in parallel, a greater number of tasks can be executed simultaneously.
  • 复制本地计算群集:适用于首次将计算环境移至 Azure 等情况。Replicating an on-premises compute cluster, such as when you first move a compute environment to Azure. 如果当前本地解决方案在单个计算节点上执行多个任务,则可以通过增大节点任务的最大数量来更彻底地对配置进行镜像操作。If your current on-premises solution executes multiple tasks per compute node, you can increase the maximum number of node tasks to more closely mirror that configuration.

示例方案Example scenario

为了举例说明并行任务执行的好处,假设根据任务应用程序的 CPU 和内存要求,Standard_D1 节点是足够的。As an example to illustrate the benefits of parallel task execution, let's say that your task application has CPU and memory requirements such that Standard_D1 nodes are sufficient. 但若要在所需时间内完成作业,则需使用 1,000 个这样的节点。But, in order to finish the job in the required time, 1,000 of these nodes are needed.

如果不使用具有 1 个 CPU 内核的 Standard_D1 节点,则可使用每个具有 16 个内核的 Standard_D14 节点,同时允许并行执行任务。Instead of using Standard_D1 nodes that have 1 CPU core, you could use Standard_D14 nodes that have 16 cores each, and enable parallel task execution. 因此,可以使用 1/16 的节点,即只需使用 63 个节点,而无需使用 1,000 个节点。Therefore, 16 times fewer nodes could be used--instead of 1,000 nodes, only 63 would be required. 此外,如果每个节点需要大型应用程序文件或引用数据,作业持续时间和效率将再次得到提升,因为数据仅复制到 63 个节点。Additionally, if large application files or reference data are required for each node, job duration and efficiency are again improved since the data is copied to only 63 nodes.

允许并行执行任务Enable parallel task execution

可在池级别配置计算节点,以便并行执行任务。You configure compute nodes for parallel task execution at the pool level. 使用 Batch .NET 库时,请在创建池时设置CloudPool. cloudpool.maxtaskspercomputenode属性。With the Batch .NET library, set the CloudPool.MaxTasksPerComputeNode property when you create a pool. 如果使用的是批处理 REST API,请在创建池时在请求正文中设置maxTasksPerNode元素。If you are using the Batch REST API, set the maxTasksPerNode element in the request body during pool creation.

Azure Batch 允许你将每个节点的任务设置为(4倍)核心节点数。Azure Batch allows you to set tasks per node up to (4x) the number of core nodes. 例如,如果将池的节点大小配置为“大型”(四核),则可将 maxTasksPerNode 设置为 16。For example, if the pool is configured with nodes of size "Large" (four cores), then maxTasksPerNode may be set to 16. 但是,无论节点有多少个核心,每个节点的任务数不能超过256个。However, regardless of how many cores the node has, you can't have more than 256 tasks per node. 有关每个节点大小的核心数的详细信息,请参阅云服务的大小For details on the number of cores for each of the node sizes, see Sizes for Cloud Services. 有关服务限制的详细信息,请参阅 Azure Batch 服务的配额和限制For more information on service limits, see Quotas and limits for the Azure Batch service.

提示

为池构造自动缩放公式时,请务必考虑 maxTasksPerNode 值。Be sure to take into account the maxTasksPerNode value when you construct an autoscale formula for your pool. 例如,如果增加每个节点的任务数,则可能会极大地影响对 $RunningTasks 求值的公式。For example, a formula that evaluates $RunningTasks could be dramatically affected by an increase in tasks per node. 有关详细信息,请参阅自动缩放 Azure Batch 池中的计算节点See Automatically scale compute nodes in an Azure Batch pool for more information.

任务分发Distribution of tasks

当池中的计算节点可以并行执行任务时,请务必指定任务在池中各节点之间的分布方式。When the compute nodes in a pool can execute tasks concurrently, it's important to specify how you want the tasks to be distributed across the nodes in the pool.

通过使用CloudPool. cloudpool.taskschedulingpolicy属性,可以指定应在池中的所有节点之间平均分配任务("分配")。By using the CloudPool.TaskSchedulingPolicy property, you can specify that tasks should be assigned evenly across all nodes in the pool ("spreading"). 或者,先给池中的每个节点分配尽量多的任务,再将任务分配给池中的其他节点(“装箱式”)。Or you can specify that as many tasks as possible should be assigned to each node before tasks are assigned to another node in the pool ("packing").

作为此功能的重要使用方式的一个示例,请考虑使用cloudpool.maxtaskspercomputenode值为16的标准_D14节点池(在上面的示例中)。As an example of how this feature is valuable, consider the pool of Standard_D14 nodes (in the example above) that is configured with a CloudPool.MaxTasksPerComputeNode value of 16. 如果 CloudPool 配置了cloudpool.taskschedulingpolicycomputenodefilltype 设置,则它将最大程度地利用每个节点的所有16个核心,并允许自动缩放池从池中修剪未使用的节点(没有分配任何任务的节点)。If the CloudPool.TaskSchedulingPolicy is configured with a ComputeNodeFillType of Pack, it would maximize usage of all 16 cores of each node and allow an autoscaling pool to prune unused nodes from the pool (nodes without any tasks assigned). 这可以最大程度地减少资源使用量并节省资金。This minimizes resource usage and saves money.

Batch .NET 示例Batch .NET example

Batch .Net API 代码片段演示了一个创建池的请求,该池包含四个节点,每个节点最多四个任务。This Batch .NET API code snippet shows a request to create a pool that contains four nodes with a maximum of four tasks per node. 它指定了一个任务计划策略,要求先用任务填充一个节点,然后再将任务分配给池中的其他节点。It specifies a task scheduling policy that will fill each node with tasks prior to assigning tasks to another node in the pool. 有关使用 Batch .NET API 添加池的详细信息,请参阅BatchClient. PoolOperations. batchclient.pooloperations.createpoolFor more information on adding pools by using the Batch .NET API, see BatchClient.PoolOperations.CreatePool.

CloudPool pool =
    batchClient.PoolOperations.CreatePool(
        poolId: "mypool",
        targetDedicatedComputeNodes: 4
        virtualMachineSize: "standard_d1_v2",
        cloudServiceConfiguration: new CloudServiceConfiguration(osFamily: "5"));

pool.MaxTasksPerComputeNode = 4;
pool.TaskSchedulingPolicy = new TaskSchedulingPolicy(ComputeNodeFillType.Pack);
pool.Commit();

Batch REST 示例Batch REST example

BATCH REST API 代码片段显示了创建包含两个大型节点的池的请求,其中每个节点最多四个任务。This Batch REST API snippet shows a request to create a pool that contains two large nodes with a maximum of four tasks per node. 有关使用 REST API 添加池的详细信息,请参阅将池添加到帐户For more information on adding pools by using the REST API, see Add a pool to an account.

{
  "odata.metadata":"https://myaccount.myregion.batch.azure.com/$metadata#pools/@Element",
  "id":"mypool",
  "vmSize":"large",
  "cloudServiceConfiguration": {
    "osFamily":"4",
    "targetOSVersion":"*",
  }
  "targetDedicatedComputeNodes":2,
  "maxTasksPerNode":4,
  "enableInterNodeCommunication":true,
}

备注

只能在创建池时设置 maxTasksPerNode 元素和cloudpool.maxtaskspercomputenode属性。You can set the maxTasksPerNode element and MaxTasksPerComputeNode property only at pool creation time. 创建完池以后,不能对上述元素和属性进行修改。They cannot be modified after a pool has already been created.

代码示例Code sample

GitHub 上的ParallelNodeTasks项目说明了如何使用cloudpool.maxtaskspercomputenode属性。The ParallelNodeTasks project on GitHub illustrates the use of the CloudPool.MaxTasksPerComputeNode property.

此C#控制台应用程序使用Batch .net库创建包含一个或多个计算节点的池。This C# console application uses the Batch .NET library to create a pool with one or more compute nodes. 并在这些节点上执行其数量可以配置的任务,以便模拟可变负荷。It executes a configurable number of tasks on those nodes to simulate variable load. 应用程序的输出指定了哪些节点执行了每个任务。Output from the application specifies which nodes executed each task. 该应用程序还提供了作业参数和持续时间的摘要。The application also provides a summary of the job parameters and duration. 下面显示了同一个应用程序运行两次后的输出摘要部分。The summary portion of the output from two different runs of the sample application appears below.

Nodes: 1
Node size: large
Max tasks per node: 1
Tasks: 32
Duration: 00:30:01.4638023

第一次执行示例应用程序时,结果显示,在池中只有一个节点且使用默认的一个节点一个任务设置的情况下,作业持续时间超过 30 分钟。The first execution of the sample application shows that with a single node in the pool and the default setting of one task per node, the job duration is over 30 minutes.

Nodes: 1
Node size: large
Max tasks per node: 4
Tasks: 32
Duration: 00:08:48.2423500

第二次运行示例应用程序时,显示作业持续时间显著缩短。The second run of the sample shows a significant decrease in job duration. 这是因为该池已被配置为每个节点四个任务,因此可以并行执行任务,使得作业可以在大约四分之一的时间内完成。This is because the pool was configured with four tasks per node, which allows for parallel task execution to complete the job in nearly a quarter of the time.

备注

上述摘要中的作业持续时间不包括创建池的时间。The job durations in the summaries above do not include pool creation time. 上述每个作业都提交到此前已创建的池,这些池的计算节点在提交时处于空闲状态。Each of the jobs above was submitted to previously created pools whose compute nodes were in the Idle state at submission time.

后续步骤Next steps

Batch 资源管理器热度地图Batch Explorer Heat Map

Batch Explorer 是一个功能丰富的免费独立客户端工具,可帮助创建、调试和监视 Azure Batch 应用程序。Batch Explorer is a free, rich-featured, standalone client tool to help create, debug, and monitor Azure Batch applications. Batch Explorer 包含“热度地图”功能,可提供任务执行的可视化效果。Batch Explorer contains a Heat Map feature that provides visualization of task execution. 当您执行ParallelTasks示例应用程序时,您可以使用热度地图功能在每个节点上轻松地可视化并行任务的执行。When you're executing the ParallelTasks sample application, you can use the Heat Map feature to easily visualize the execution of parallel tasks on each node.