使用低優先順序的 VM 搭配 BatchUse low-priority VMs with Batch

Azure Batch 提供低優先順序的虛擬機器 (VM),可降低 Batch 工作負載的成本。Azure Batch offers low-priority virtual machines (VMs) to reduce the cost of Batch workloads. 低優先順序的 VM 能提供大量的計算能力,且使用成本非常低,從而實現新的 Batch 工作負載類型。Low-priority VMs make new types of Batch workloads possible by enabling a large amount of compute power to be used for a very low cost.

低優先順序的 VM 能善用 Azure 中的剩餘容量。Low-priority VMs take advantage of surplus capacity in Azure. 當您指定集區中的低優先順序 VM 時,Azure Batch 就會在有多餘的容量時加以使用。When you specify low-priority VMs in your pools, Azure Batch can use this surplus, when available.

使用低優先順序 VM 的代價是,這些 VM 可能無法用於配置,或可能隨時會有高優先順序的 VM 先佔,視可用容量而定。The tradeoff for using low-priority VMs is that those VMs may not be available to be allocated or may be preempted at any time, depending on available capacity. 基於這個理由,低優先順序的 VM 最適合特定類型的工作負載。For this reason, low-priority VMs are most suitable for certain types of workloads. 低優先順序的 VM 是用於批次和非同步處理的工作負載,這種工作負載的作業完成時間很有彈性,且工作會分散於許多 VM。Use low-priority VMs for batch and asynchronous processing workloads where the job completion time is flexible and the work is distributed across many VMs.

低優先順序的 VM 比起專用的 VM,能以大幅降低的價格提供。Low-priority VMs are offered at a significantly reduced price compared with dedicated VMs. 如需定價詳細資料,請參閱 Batch 定價For pricing details, see Batch Pricing.

低優先順序 VM 的使用案例Use cases for low-priority VMs

如果有低優先順序的 VM 的特性,哪些工作負載可以使用和無法使用它們?Given the characteristics of low-priority VMs, what workloads can and cannot use them? 一般情況下,批次處理工作負載就很適合,因為作業會區分成許多平行的工作,或是會將許多作業相應放大並分散於許多 VM。In general, batch processing workloads are a good fit, as jobs are broken into many parallel tasks or there are many jobs that are scaled out and distributed across many VMs.

  • 若要充分使用 Azure 中的剩餘容量,可將適當的作業相應放大。To maximize use of surplus capacity in Azure, suitable jobs can scale out.

  • VM 偶爾可能會無法使用或被優先佔用,這會導致作業容量降低,且可能會導致工作中斷及重新執行。Occasionally VMs may not be available or are preempted, which results in reduced capacity for jobs and may lead to task interruption and reruns. 因此,作業在可使用的時間內必須是有彈性的,才會將作業加以執行。Jobs must therefore be flexible in the time they can take to run.

  • 如果工作較長的作業受到中斷,可能就會影響較大。Jobs with longer tasks may be impacted more if interrupted. 如果長時間執行的工作在執行時實作檢查點來儲存進度,就可降低中斷所造成的影響。If long-running tasks implement checkpointing to save progress as they execute, then the impact of interruption is reduced. 執行時間較短的工作通常最適合低優先順序的 VM,因為中斷的影響較小。Tasks with shorter execution times tend to work best with low-priority VMs, because the impact of interruption is far less.

  • 利用多個 VM 長時間執行的 MPI 作業並不太適合使用低優先順序的 VM,因為一個優先佔用的 VM 就可能會導致整個作業必須重新執行一次。Long-running MPI jobs that utilize multiple VMs are not well suited to use low-priority VMs, because one preempted VM can lead to the whole job having to run again.

適用於低優先順序 VM 的一些批次處理使用案例的範例如下︰Some examples of batch processing use cases well suited to use low-priority VMs are:

  • 開發與測試:尤其是如果您正在開發大規模的解決方案,可節省可觀的成本。Development and testing: In particular, if large-scale solutions are being developed, significant savings can be realized. 所有的測試類型都能有所助益,但大規模的負載測試及迴歸測試都是很棒的用途。All types of testing can benefit, but large-scale load testing and regression testing are great uses.

  • 補充隨選容量︰低優先順序的 VM 可用來補充一般的專用 VM - 在可使用時,作業就能加以調整並從而以較低的成本加速完成;在無法使用時,仍能使用專用 VM 的基準。Supplementing on-demand capacity: Low-priority VMs can be used to supplement regular dedicated VMs - when available, jobs can scale and therefore complete quicker for lower cost; when not available, the baseline of dedicated VMs remains available.

  • 彈性的作業執行時間︰如果作業完成所需的時間有彈性,就能容許可能的容量下降;但如果增加低優先順序的 VM ,作業就能經常執行地更快,且成本更低。Flexible job execution time: If there is flexibility in the time jobs have to complete, then potential drops in capacity can be tolerated; however, with the addition of low-priority VMs jobs frequently run faster and for a lower cost.

有幾種方法可將 Batch 集區設為使用低優先順序的 VM,視作業執行時間的彈性而定︰Batch pools can be configured to use low-priority VMs in a few ways, depending on the flexibility in job execution time:

  • 集區中只使用低優先順序的 VM。Low-priority VMs can solely be used in a pool. 在此情況下,Batch 會在可用時復原任何優先佔用的容量。In this case, Batch recovers any preempted capacity when available. 此種設定是執行作業最便宜的方式,因為只會使用低優先順序的 VM。This configuration is the cheapest way to execute jobs, as only low-priority VMs are used.

  • 低優先順序的 VM 可以用來搭配專用 VM 的固定基準。Low-priority VMs can be used in conjunction with a fixed baseline of dedicated VMs. 固定的專用 VM 數目可確保一律會有一些容量可保持作業進度。The fixed number of dedicated VMs ensures there is always some capacity to keep a job progressing.

  • 可以將專用 VM 和低優先順序 VM 動態混用,如此就能在可用時單獨使用低成本低優先順序的 VM,但會視需要將原定價格的專用 VM 相應增加。There can be dynamic mix of dedicated and low-priority VMs, so that the cheaper low-priority VMs are solely used when available, but the full-priced dedicated VMs are scaled up when required. 此設定可保留能夠維持作業進度的最少容量。This configuration keeps a minimum amount of capacity available to keep the jobs progressing.

Batch 支援低優先順序的 VMBatch support for low-priority VMs

Azure Batch 提供多項功能,能讓使用者輕鬆使用及受益於低優先順序的 VM:Azure Batch provides several capabilities that make it easy to consume and benefit from low-priority VMs:

  • Batch 集區可以同時包含專用 VM 和低優先順序的 VM。Batch pools can contain both dedicated VMs and low-priority VMs. 您可以使用明確的調整大小作業或使用自動調整,在集區建立時指定每種類型的 VM 數目,或是針對現有集區隨時,將每種類型的 VM 數目加以變更。The number of each type of VM can be specified when a pool is created, or changed at any time for an existing pool, using the explicit resize operation or using auto-scale. 無論集區中的 VM 類型為何,作業和工作提交可維持不變。Job and task submission can remain unchanged, regardless of the VM types in the pool. 您也可以設定集區,以盡可能便宜的方式,完全使用低優先順序的 VM 來執行作業,但如果容量低於最小臨界值,就會運轉專用 VM,從而保持作業執行。You can also configure a pool to completely use low-priority VMs to run jobs as cheaply as possible, but spin up dedicated VMs if the capacity drops below a minimum threshold, to keep jobs running.

  • Batch 集區會自動搜尋低優先順序 VM 的目標數目。Batch pools automatically seek the target number of low-priority VMs. 如果 VM 被優先佔用,Batch 就會嘗試取代遺失的容量並回到目標。If VMs are preempted, then Batch attempts to replace the lost capacity and return to the target.

  • 當工作中斷時,Batch 會偵測到並自動將工作重新放入佇列以再次執行。When tasks are interrupted, Batch detects and automatically requeues tasks to run again.

  • 低優先順序的 VM 有不同於專用 VM 的 vCPU 配額。Low-priority VMs have a separate vCPU quota that differs from the one for dedicated VMs. 低優先順序 VM 的配額高於專用 VM 的配額,因為低優先順序 VM 的成本較低。The quota for low-priority VMs is higher than the quota for dedicated VMs, because low-priority VMs cost less. 如需詳細資訊,請參閱 Batch 服務配額和限制For more information, see Batch service quotas and limits.

注意

使用者訂用帳戶模式中建立的 Batch 帳戶目前不支援低優先順序的 VM。Low-priority VMs are not currently supported for Batch accounts created in user subscription mode.

建立和更新集區Create and update pools

Batch 集區可以同時包含專用 VM 和低優先順序的 VM (亦稱為計算節點)。A Batch pool can contain both dedicated and low-priority VMs (also referred to as compute nodes). 您可以為專用 VM 和低優先順序的 VM 設定計算節點的目標數目。You can set the target number of compute nodes for both dedicated and low-priority VMs. 節點的目標數目會指定您在集區中想要的 VM 數目。The target number of nodes specifies the number of VMs you want to have in the pool.

例如,使用目標為 5 個專用 VM 和 20 個低優先順序 VM 的 Azure 雲端服務來建立集區:For example, to create a pool using Azure cloud service VMs with a target of 5 dedicated VMs and 20 low-priority VMs:

CloudPool pool = batchClient.PoolOperations.CreatePool(
    poolId: "cspool",
    targetDedicatedComputeNodes: 5,
    targetLowPriorityComputeNodes: 20,
    virtualMachineSize: "Standard_D2_v2",
    cloudServiceConfiguration: new CloudServiceConfiguration(osFamily: "5") // WS 2016
);

使用目標為 5 個專用 VM 和 20 個低優先順序 VM 的 Azure 虛擬機器 (在此情況下為 Linux VM) 來建立集區:To create a pool using Azure virtual machines (in this case Linux VMs) with a target of 5 dedicated VMs and 20 low-priority VMs:

ImageReference imageRef = new ImageReference(
    publisher: "Canonical",
    offer: "UbuntuServer",
    sku: "16.04-LTS",
    version: "latest");

// Create the pool
VirtualMachineConfiguration virtualMachineConfiguration =
    new VirtualMachineConfiguration("batch.node.ubuntu 16.04", imageRef);

pool = batchClient.PoolOperations.CreatePool(
    poolId: "vmpool",
    targetDedicatedComputeNodes: 5,
    targetLowPriorityComputeNodes: 20,
    virtualMachineSize: "Standard_D2_v2",
    virtualMachineConfiguration: virtualMachineConfiguration);

您可以取得專用 VM 和低優先順序 VM 目前的節點數目:You can get the current number of nodes for both dedicated and low-priority VMs:

int? numDedicated = pool1.CurrentDedicatedComputeNodes;
int? numLowPri = pool1.CurrentLowPriorityComputeNodes;

集區節點的屬性可表示節點為專用的還是低優先順序的 VM:Pool nodes have a property to indicate if the node is a dedicated or low-priority VM:

bool? isNodeDedicated = poolNode.IsDedicated;

當集區中的一或多個節點被優先佔用時,集區的列出節點作業仍會傳回那些節點。When one or more nodes in a pool are preempted, a list nodes operation on the pool still returns those nodes. 低優先順序節點的目前數目會維持不變,但這些節點的狀態都已設定為先佔狀態。The current number of low-priority nodes remains unchanged, but those nodes have their state set to the Preempted state. Batch 會嘗試尋找取代 VM,如果成功,節點在變成可供工作執行前,會逐步變成建立中啟動中狀態,就像新的節點一樣。Batch attempts to find replacement VMs and, if successful, the nodes go through Creating and then Starting states before becoming available for task execution, just like new nodes.

調整包含低優先順序 VM 的集區Scale a pool containing low-priority VMs

如同由專用 VM 單獨組成的集區,可藉由呼叫 Resize 方法或使用自動調整,調整包含低優先順序 VM 的集區。As with pools solely consisting of dedicated VMs, it is possible to scale a pool containing low-priority VMs by calling the Resize method or by using autoscale.

調整集區大小的作業會採用第二個選擇性參數,可更新 targetLowPriorityNodes 的值:The pool resize operation takes a second optional parameter that updates the value of targetLowPriorityNodes:

pool.Resize(targetDedicatedComputeNodes: 0, targetLowPriorityComputeNodes: 25);

集區自動調整公式支援低優先順序的 VM,如下所示︰The pool autoscale formula supports low-priority VMs as follows:

  • 您可以取得或設定服務定義之 $TargetLowPriorityNodes 變數的值。You can get or set the value of the service-defined variable $TargetLowPriorityNodes.

  • 您可以取得服務定義之 $CurrentLowPriorityNodes 變數的值。You can get the value of the service-defined variable $CurrentLowPriorityNodes.

  • 您可以取得服務定義之 $PreemptedNodeCount 變數的值。You can get the value of the service-defined variable $PreemptedNodeCount. 此變數會傳回優先佔用狀態中的節點數目,並可依無法使用的優先佔用節點數目,將您的專用節點數目相應增加或相應減少。This variable returns the number of nodes in the preempted state and allows you to scale up or down the number of dedicated nodes, depending on the number of preempted nodes that are unavailable.

工作 (Job) 和工作 (Task)Jobs and tasks

作業和工作幾乎都不需要針對低優先順序節點設定;唯一的支援如下所示︰Jobs and tasks require little additional configuration for low-priority nodes; the only support is as follows:

  • 作業的 JobManagerTask 屬性都有新的屬性,AllowLowPriorityNodeThe JobManagerTask property of a job has a new property, AllowLowPriorityNode. 當這個屬性為 true 時,就可以在專用節點或低優先順序的節點上排程作業管理員工作。When this property is true, the job manager task can be scheduled on either a dedicated or low-priority node. 如果這個屬性為 false,就只會在專用節點上排程作業管理員工作。If this property is false, the job manager task is scheduled to a dedicated node only.

  • 環境變數可供工作應用程式使用,因此它可以判斷是在低優先順序還是專用節點上執行。An environment variable is available to a task application so that it can determine whether it is running on a low-priority or dedicated node. 環境變數是 AZ_BATCH_NODE_IS_DEDICATED。The environment variable is AZ_BATCH_NODE_IS_DEDICATED.

處理優先佔用Handling preemption

VM 可能偶爾會被優先佔用;當發生優先佔用時,Batch 會執行下列動作︰VMs may occasionally be preempted; when preemption happens, Batch does the following:

  • 優先佔用的 VM 都會將其狀態更新為優先佔用The preempted VMs have their state updated to Preempted.
  • 如果工作是在優先佔用的節點 VM 上執行,就會將這些工作重新排入佇列並再次執行。If tasks were running on the preempted node VMs, then those tasks are requeued and run again.
  • VM 實際上會被刪除,導致遺失在 VM 上本機儲存的任何資料。The VM is effectively deleted, leading to loss of any data stored locally on the VM.
  • 集區會繼續嘗試觸達可用的低優先順序節點之目標數目。The pool continually attempts to reach the target number of low-priority nodes available. 找到取代容量時,節點會保留其識別碼,但在可供工作排程使用之前,會先重新初始化,逐步變成建立中啟動中狀態。When replacement capacity is found, the nodes keep their IDs, but are reinitialized, going through Creating and Starting states before they are available for task scheduling.
  • 優先佔用計數會在 Azure 入口網站中作為計量提供使用。Preemption counts are available as a metric in the Azure portal.

度量Metrics

Azure 入口網站中有針對低優先順序節點提供的新計量。New metrics are available in the Azure portal for low-priority nodes. 這些計量包括:These metrics are:

  • 低優先順序節點計數Low-Priority Node Count
  • 低優先順序核心計數Low-Priority Core Count
  • 先占節點計數Preempted Node Count

若要檢視 Azure 入口網站中的計量:To view metrics in the Azure portal:

  1. 在入口網站中瀏覽至您的 Batch 帳戶,並檢視 Batch 帳戶的設定。Navigate to your Batch account in the portal, and view the settings for your Batch account.
  2. 從 [監視] 區段選取 [計量] 。Select Metrics from the Monitoring section.
  3. 從 [可用的計量] 清單中選取您所需的計量。Select the metrics you desire from the Available Metrics list.

低優先順序節點的計量

後續步驟Next steps

  • 請參閱 適用於開發人員的 Batch 功能概觀,這是任何準備使用 Batch 的人員不可或缺的資訊。Read the Batch feature overview for developers, essential information for anyone preparing to use Batch. 本文包含 Batch 服務資源 (例如集區、節點、作業和工作) 的詳細資訊,以及在建置 Batch 應用程式時可使用的許多 API 功能。The article contains more detailed information about Batch service resources like pools, nodes, jobs, and tasks, and the many API features that you can use while building your Batch application.
  • 了解可用來建置 Batch 解決方案的 Batch API 和工具Learn about the Batch APIs and tools available for building Batch solutions.