預覽-在 Azure Kubernetes Service (AKS) 中建立及管理叢集的多個節點集區Preview - Create and manage multiple node pools for a cluster in Azure Kubernetes Service (AKS)

在 Azure Kubernetes Service (AKS) 中, 相同設定的節點會群組在一起成為節點集區。In Azure Kubernetes Service (AKS), nodes of the same configuration are grouped together into node pools. 這些節點集區包含執行應用程式的基礎 Vm。These node pools contain the underlying VMs that run your applications. 當您建立 AKS 叢集時, 會定義初始節點數目及其大小 (SKU), 以建立預設節點集區。The initial number of nodes and their size (SKU) are defined when you create an AKS cluster, which creates a default node pool. 若要支援具有不同計算或儲存體需求的應用程式, 您可以建立額外的節點集區。To support applications that have different compute or storage demands, you can create additional node pools. 例如, 您可以使用這些額外的節點集區, 為計算密集型應用程式提供 Gpu, 或存取高效能 SSD 儲存體。For example, use these additional node pools to provide GPUs for compute-intensive applications, or access to high-performance SSD storage.

注意

這項功能可讓您更進一步控制如何建立和管理多個節點集區。This feature enables higher control over how to create and manage multiple node pools. 因此, 建立/更新/刪除需要個別的命令。As a result, separate commands are required for create/update/delete. 先前的叢集作業az aks createaz aks update透過或使用 managedCluster API, 而且是變更您的控制平面和單一節點集區的唯一選項。Previously cluster operations through az aks create or az aks update used the managedCluster API and were the only option to change your control plane and a single node pool. 這項功能會透過 agentPool API 公開代理程式組件區的個別作業集, 並要求az aks nodepool使用命令集來執行個別節點集區上的作業。This feature exposes a separate operation set for agent pools through the agentPool API and require use of the az aks nodepool command set to execute operations on an individual node pool.

本文說明如何在 AKS 叢集中建立和管理多個節點集區。This article shows you how to create and manage multiple node pools in an AKS cluster. 此功能目前為預覽狀態。This feature is currently in preview.

重要

AKS 預覽功能是自助加入宣告。AKS preview features are self-service opt-in. 預覽會以「原樣」和「可用」的方式提供, 並從服務等級協定中排除, 並享有有限擔保。Previews are provided "as-is" and "as available" and are excluded from the service level agreements and limited warranty. AKS 預覽的部分是由客戶支援, 以最大的方式來涵蓋。AKS Previews are partially covered by customer support on best effort basis. 因此, 這些功能並不適用于生產環境使用。As such, these features are not meant for production use. 如需其他資訊, 請參閱下列支援文章:For additional infromation, please see the following support articles:

開始之前Before you begin

您需要安裝並設定 Azure CLI 版本2.0.61 或更新版本。You need the Azure CLI version 2.0.61 or later installed and configured. 執行 az --version 以尋找版本。Run az --version to find the version. 如果您需要安裝或升級,請參閱安裝 Azure CLIIf you need to install or upgrade, see Install Azure CLI.

安裝 aks-preview CLI 擴充功能Install aks-preview CLI extension

若要使用多個節點集區, 您需要aks-preview CLI 擴充功能版本0.4.1 或更高版本。To use multiple node pools, you need the aks-preview CLI extension version 0.4.1 or higher. 使用az extension add命令來安裝aks-preview Azure CLI 擴充功能, 然後使用az extension update命令檢查是否有任何可用的更新::Install the aks-preview Azure CLI extension using the az extension add command, then check for any available updates using the az extension update command::

# Install the aks-preview extension
az extension add --name aks-preview

# Update the extension to make sure you have the latest version installed
az extension update --name aks-preview

註冊多個節點集區功能提供者Register multiple node pool feature provider

若要建立可使用多個節點集區的 AKS 叢集, 請先在您的訂用帳戶上啟用兩個功能旗標。To create an AKS cluster that can use multiple node pools, first enable two feature flags on your subscription. 多節點集區叢集會使用虛擬機器擴展集 (VMSS) 來管理 Kubernetes 節點的部署和設定。Multi-node pool clusters use a virtual machine scale set (VMSS) to manage the deployment and configuration of the Kubernetes nodes. 使用az feature register命令來註冊MultiAgentpoolPreviewVMSSPreview功能旗標, 如下列範例所示:Register the MultiAgentpoolPreview and VMSSPreview feature flags using the az feature register command as shown in the following example:

警告

當您在訂用帳戶上註冊功能時, 目前無法取消註冊該功能。When you register a feature on a subscription, you can't currently un-register that feature. 啟用一些預覽功能之後, 預設值可能會用於在訂用帳戶中建立的所有 AKS 叢集。After you enable some preview features, defaults may be used for all AKS clusters then created in the subscription. 請勿在生產訂用帳戶上啟用預覽功能。Don't enable preview features on production subscriptions. 使用個別的訂用帳戶來測試預覽功能並收集意見反應。Use a separate subscription to test preview features and gather feedback.

az feature register --name MultiAgentpoolPreview --namespace Microsoft.ContainerService
az feature register --name VMSSPreview --namespace Microsoft.ContainerService

注意

您在成功註冊MultiAgentpoolPreview之後所建立的任何 AKS 叢集都會使用此預覽叢集體驗。Any AKS cluster you create after you've successfully registered the MultiAgentpoolPreview use this preview cluster experience. 若要繼續建立一般、完全支援的叢集, 請勿在生產訂用帳戶上啟用預覽功能。To continue to create regular, fully-supported clusters, don't enable preview features on production subscriptions. 使用個別的測試或開發 Azure 訂用帳戶來測試預覽功能。Use a separate test or development Azure subscription for testing preview features.

狀態需要幾分鐘的時間才會顯示「已註冊」。It takes a few minutes for the status to show Registered. 您可以使用az feature list命令來檢查註冊狀態:You can check on the registration status using the az feature list command:

az feature list -o table --query "[?contains(name, 'Microsoft.ContainerService/MultiAgentpoolPreview')].{Name:name,State:properties.state}"
az feature list -o table --query "[?contains(name, 'Microsoft.ContainerService/VMSSPreview')].{Name:name,State:properties.state}"

準備好時, 請使用az provider register命令重新整理microsoft.containerservice資源提供者的註冊:When ready, refresh the registration of the Microsoft.ContainerService resource provider using the az provider register command:

az provider register --namespace Microsoft.ContainerService

限制Limitations

當您建立和管理支援多個節點集區的 AKS 叢集時, 適用下列限制:The following limitations apply when you create and manage AKS clusters that support multiple node pools:

  • 只有在您成功註冊訂用帳戶的MultiAgentpoolPreviewVMSSPreview功能之後, 才會有多個節點集區適用于建立的叢集。Multiple node pools are only available for clusters created after you've successfully registered the MultiAgentpoolPreview and VMSSPreview features for your subscription. 在成功註冊這些功能之前, 您無法新增或管理具有現有 AKS 叢集的節點集區。You can't add or manage node pools with an existing AKS cluster created before these features were successfully registered.
  • 您無法刪除第一個節點集區。You can't delete the first node pool.
  • 無法使用 HTTP 應用程式路由附加元件。The HTTP application routing add-on can't be used.
  • 您無法使用現有的 Resource Manager 範本來新增/更新/刪除節點集區, 就像大部分的作業一樣。You can't add/update/delete node pools using an existing Resource Manager template as with most operations. 相反地, 請使用個別的 Resource Manager 範本, 對 AKS 叢集中的節點集區進行變更。Instead, use a separate Resource Manager template to make changes to node pools in an AKS cluster.

雖然這項功能處於預覽狀態, 但仍適用下列其他限制:While this feature is in preview, the following additional limitations apply:

  • AKS 叢集最多可以有八個節點集區。The AKS cluster can have a maximum of eight node pools.
  • AKS 叢集在這八個節點集區中最多可以有400個節點。The AKS cluster can have a maximum of 400 nodes across those eight node pools.
  • 所有節點集區都必須位於相同的子網中All node pools must reside in the same subnet

建立 AKS 叢集Create an AKS cluster

若要開始使用, 請建立具有單一節點集區的 AKS 叢集。To get started, create an AKS cluster with a single node pool. 下列範例會使用az group create命令, 在eastus區域中建立名為myResourceGroup的資源群組。The following example uses the az group create command to create a resource group named myResourceGroup in the eastus region. 然後使用az AKS create命令來建立名為myAKSCluster的 AKS 叢集。An AKS cluster named myAKSCluster is then created using the az aks create command. --Kubernetes- 1.13.9版本是用來示範如何在下列步驟中更新節點集區。A --kubernetes-version of 1.13.9 is used to show how to update a node pool in a following step. 您可以指定任何支援的 Kubernetes 版本You can specify any supported Kubernetes version.

# Create a resource group in East US
az group create --name myResourceGroup --location eastus

# Create a basic single-node AKS cluster
az aks create \
    --resource-group myResourceGroup \
    --name myAKSCluster \
    --enable-vmss \
    --node-count 1 \
    --generate-ssh-keys \
    --kubernetes-version 1.13.9

建立叢集需要幾分鐘的時間。It takes a few minutes to create the cluster.

當叢集準備就緒時, 請使用az aks get-認證命令來取得要搭配使用kubectl的叢集認證:When the cluster is ready, use the az aks get-credentials command to get the cluster credentials for use with kubectl:

az aks get-credentials --resource-group myResourceGroup --name myAKSCluster

新增節點集區Add a node pool

在上一個步驟中建立的叢集具有單一節點集區。The cluster created in the previous step has a single node pool. 讓我們使用az aks nodepool add命令來新增第二個節點集區。Let's add a second node pool using the az aks nodepool add command. 下列範例會建立名為mynodepool的節點集區, 其會執行3個節點:The following example creates a node pool named mynodepool that runs 3 nodes:

az aks nodepool add \
    --resource-group myResourceGroup \
    --cluster-name myAKSCluster \
    --name mynodepool \
    --node-count 3 \
    --kubernetes-version 1.12.7

若要查看節點集區的狀態, 請使用az aks node pool list命令, 並指定您的資源群組和叢集名稱:To see the status of your node pools, use the az aks node pool list command and specify your resource group and cluster name:

az aks nodepool list --resource-group myResourceGroup --cluster-name myAKSCluster

下列範例輸出顯示已成功建立節點集區中有三個節點的mynodepoolThe following example output shows that mynodepool has been successfully created with three nodes in the node pool. 在上一個步驟中建立 AKS 叢集時, 會建立一個節點計數為1的預設nodepool1When the AKS cluster was created in the previous step, a default nodepool1 was created with a node count of 1.

$ az aks nodepool list --resource-group myResourceGroup --cluster-name myAKSCluster

[
  {
    ...
    "count": 3,
    ...
    "name": "mynodepool",
    "orchestratorVersion": "1.12.7",
    ...
    "vmSize": "Standard_DS2_v2",
    ...
  },
  {
    ...
    "count": 1,
    ...
    "name": "nodepool1",
    "orchestratorVersion": "1.13.9",
    ...
    "vmSize": "Standard_DS2_v2",
    ...
  }
]

提示

當您新增節點集區時, 如果未指定OrchestratorVersionVmSize , 則會根據 AKS 叢集的預設值來建立節點。If no OrchestratorVersion or VmSize is specified when you add a node pool, the nodes are created based on the defaults for the AKS cluster. 在此範例中, 這是 Kubernetes 的版本1.13.9和節點大小Standard_DS2_v2In this example, that was Kubernetes version 1.13.9 and node size of Standard_DS2_v2.

升級節點集區Upgrade a node pool

注意

叢集或節點集區上的升級和調整作業是互斥的。Upgrade and scale operations on a cluster or node pool are mutually exclusive. 您不能同時升級和調整叢集或節點集區。You cannot have a cluster or node pool simultaneously upgrade and scale. 相反地, 每個作業類型必須在目標資源上完成, 然後才在該相同資源上進行下一個要求。Instead, each operation type must complete on the target resource prior to the next request on that same resource. 請在我們的疑難排解指南中閱讀更多相關資訊。Read more about this on our troubleshooting guide.

當您在第一個步驟中建立 AKS 叢集時, --kubernetes-version已指定1.13.9的。When your AKS cluster was created in the first step, a --kubernetes-version of 1.13.9 was specified. 這會設定控制平面和初始節點集區的 Kubernetes 版本。This sets the Kubernetes version for both the control plane and the initial node pool. 有不同的命令可用於升級控制平面和節點集區的 Kubernetes 版本。There are different commands for upgrading the Kubernetes version of the control plane and the node pool. 命令是用來升級控制平面, az aks nodepool upgrade而則是用來升級個別的節點集區。 az aks upgradeThe az aks upgrade command is used to upgrade the control plane, while the az aks nodepool upgrade is used to upgrade an individual node pool.

讓我們將mynodepool升級至 Kubernetes 1.13.9Let's upgrade the mynodepool to Kubernetes 1.13.9. 使用az aks node pool upgrade命令升級節點集區, 如下列範例所示:Use the az aks node pool upgrade command to upgrade the node pool, as shown in the following example:

az aks nodepool upgrade \
    --resource-group myResourceGroup \
    --cluster-name myAKSCluster \
    --name mynodepool \
    --kubernetes-version 1.13.9 \
    --no-wait

提示

若要將控制平面升級至1.14.5, az aks upgrade -k 1.14.5請執行。To upgrade the control plane to 1.14.5, run az aks upgrade -k 1.14.5.

使用az aks node pool list命令再次列出節點集區的狀態。List the status of your node pools again using the az aks node pool list command. 下列範例顯示mynodepool處於1.13.9升級狀態:The following example shows that mynodepool is in the Upgrading state to 1.13.9:

$ az aks nodepool list -g myResourceGroup --cluster-name myAKSCluster

[
  {
    ...
    "count": 3,
    ...
    "name": "mynodepool",
    "orchestratorVersion": "1.13.9",
    ...
    "provisioningState": "Upgrading",
    ...
    "vmSize": "Standard_DS2_v2",
    ...
  },
  {
    ...
    "count": 1,
    ...
    "name": "nodepool1",
    "orchestratorVersion": "1.13.9",
    ...
    "provisioningState": "Succeeded",
    ...
    "vmSize": "Standard_DS2_v2",
    ...
  }
]

將節點升級為指定的版本需要幾分鐘的時間。It takes a few minutes to upgrade the nodes to the specified version.

最佳做法是將 AKS 叢集中的所有節點集區升級至相同的 Kubernetes 版本。As a best practice, you should upgrade all node pools in an AKS cluster to the same Kubernetes version. 升級個別節點集區的功能可讓您執行輪流升級, 並在節點集區之間排程 pod, 以維護上述條件約束內的應用程式執行時間。The ability to upgrade individual node pools lets you perform a rolling upgrade and schedule pods between node pools to maintain application uptime within the above constraints mentioned.

注意

Kubernetes 使用標準的語義版本控制版本設定配置。Kubernetes uses the standard Semantic Versioning versioning scheme. 版本號碼會以x-y表示, 其中x是主要版本, y是次要版本, 而z是修補程式版本。The version number is expressed as x.y.z, where x is the major version, y is the minor version, and z is the patch version. 例如, 在版本1.12.6中, 1 是主要版本, 12 是次要版本, 而6是修補程式版本。For example, in version 1.12.6, 1 is the major version, 12 is the minor version and 6 is the patch version. 在叢集建立期間, 會設定控制平面和初始節點集區的 Kubernetes 版本。The Kubernetes version of the control plane as well as the initial node pool is set during cluster creation. 當所有其他節點集區新增至叢集時, 會設定其 Kubernetes 版本。All additional node pools have their Kubernetes version set when they are added to the cluster. 節點集區以及節點集區與控制平面之間的 Kubernetes 版本可能不同, 但適用下列限制:The Kubernetes versions may differ between node pools as well as between a node pool and the control plane, but the follow restrictions apply:

  • 節點集區版本必須與控制平面具有相同的主要版本。The node pool version must have the same major version as the control plane.
  • 節點集區版本可能是一個小於控制平面版本的次要版本。The node pool version may be one minor version less than the control plane version.
  • 節點集區版本可能是任何修補程式版本, 只要遵循其他兩個條件約束即可。The node pool version may be any patch version as long as the other two constraints are followed.

若要升級控制平面的 Kubernetes 版本, 請使用az aks upgradeTo upgrade the Kubernetes version of the control plane, use az aks upgrade. 如果您的叢集只有一個節點集區, az aks upgrade此命令也會升級節點集區的 Kubernetes 版本。If your cluster only has one node pool, the az aks upgrade command will also upgrade the Kubernetes version of the node pool.

手動調整節點集區Scale a node pool manually

當您的應用程式工作負載需求變更時, 您可能需要調整節點集區中的節點數目。As your application workload demands change, you may need to scale the number of nodes in a node pool. 節點數目可以向上或向下調整。The number of nodes can be scaled up or down.

若要調整節點集區中的節點數目, 請使用az aks node pool scale命令。To scale the number of nodes in a node pool, use the az aks node pool scale command. 下列範例會將mynodepool中的節點數目調整為5:The following example scales the number of nodes in mynodepool to 5:

az aks nodepool scale \
    --resource-group myResourceGroup \
    --cluster-name myAKSCluster \
    --name mynodepool \
    --node-count 5 \
    --no-wait

使用az aks node pool list命令再次列出節點集區的狀態。List the status of your node pools again using the az aks node pool list command. 下列範例顯示mynodepool處於調整狀態, 而且有5個節點的新計數:The following example shows that mynodepool is in the Scaling state with a new count of 5 nodes:

$ az aks nodepool list -g myResourceGroupPools --cluster-name myAKSCluster

[
  {
    ...
    "count": 5,
    ...
    "name": "mynodepool",
    "orchestratorVersion": "1.13.9",
    ...
    "provisioningState": "Scaling",
    ...
    "vmSize": "Standard_DS2_v2",
    ...
  },
  {
    ...
    "count": 1,
    ...
    "name": "nodepool1",
    "orchestratorVersion": "1.13.9",
    ...
    "provisioningState": "Succeeded",
    ...
    "vmSize": "Standard_DS2_v2",
    ...
  }
]

調整作業需要幾分鐘的時間才能完成。It takes a few minutes for the scale operation to complete.

藉由啟用叢集自動調整程式, 自動調整特定節點集區Scale a specific node pool automatically by enabling the cluster autoscaler

AKS 在預覽中提供不同的功能, 可使用稱為叢集自動調整程式的功能來自動調整節點集區。AKS offers a separate feature in preview to automatically scale node pools with a feature called the cluster autoscaler. 這項功能是 AKS 的附加元件, 可針對每個節點集區使用唯一的最小和最大調整計數, 為每個節點集區啟用。This feature is an AKS add-on that can be enabled per node pool with unique minimum and maximum scale counts per node pool. 瞭解如何使用每個節點集區的叢集自動調整程式Learn how to use the cluster autoscaler per node pool.

刪除節點集區Delete a node pool

如果您不再需要集區, 您可以將它刪除並移除基礎 VM 節點。If you no longer need a pool, you can delete it and remove the underlying VM nodes. 若要刪除節點集區, 請使用az aks node pool delete命令並指定節點集區名稱。To delete a node pool, use the az aks node pool delete command and specify the node pool name. 下列範例會刪除在先前步驟中建立的mynoodepool :The following example deletes the mynoodepool created in the previous steps:

警告

當您刪除節點集區時, 不會發生資料遺失的復原選項。There are no recovery options for data loss that may occur when you delete a node pool. 如果無法在其他節點集區上排程 pod, 則無法使用這些應用程式。If pods can't be scheduled on other node pools, those applications are unavailable. 當使用中的應用程式沒有資料備份或在叢集中的其他節點集區上執行的能力時, 請確定您不會刪除節點集區。Make sure you don't delete a node pool when in-use applications don't have data backups or the ability to run on other node pools in your cluster.

az aks nodepool delete -g myResourceGroup --cluster-name myAKSCluster --name mynodepool --no-wait

下列來自az aks node pool list命令的範例輸出顯示mynodepool處於刪除狀態:The following example output from the az aks node pool list command shows that mynodepool is in the Deleting state:

$ az aks nodepool list -g myResourceGroup --cluster-name myAKSCluster

[
  {
    ...
    "count": 5,
    ...
    "name": "mynodepool",
    "orchestratorVersion": "1.13.9",
    ...
    "provisioningState": "Deleting",
    ...
    "vmSize": "Standard_DS2_v2",
    ...
  },
  {
    ...
    "count": 1,
    ...
    "name": "nodepool1",
    "orchestratorVersion": "1.13.9",
    ...
    "provisioningState": "Succeeded",
    ...
    "vmSize": "Standard_DS2_v2",
    ...
  }
]

刪除節點和節點集區需要幾分鐘的時間。It takes a few minutes to delete the nodes and the node pool.

指定節點集區的 VM 大小Specify a VM size for a node pool

在先前的範例中, 若要建立節點集區, 則會針對在叢集中建立的節點使用預設的 VM 大小。In the previous examples to create a node pool, a default VM size was used for the nodes created in the cluster. 較常見的案例是使用不同的 VM 大小和功能來建立節點集區。A more common scenario is for you to create node pools with different VM sizes and capabilities. 例如, 您可以建立節點集區, 其中包含具有大量 CPU 或記憶體的節點, 或可提供 GPU 支援的節點集區。For example, you may create a node pool that contains nodes with large amounts of CPU or memory, or a node pool that provides GPU support. 在下一個步驟中, 您會使用污點和容差, 告訴 Kubernetes 排程器如何限制可以在這些節點上執行的 pod 存取。In the next step, you use taints and tolerations to tell the Kubernetes scheduler how to limit access to pods that can run on these nodes.

在下列範例中, 建立以 GPU 為基礎的節點集區, 以使用Standard_NC6 VM 大小。In the following example, create a GPU-based node pool that uses the Standard_NC6 VM size. 這些 Vm 是由 NVIDIA Tesla K80 插卡提供技術支援。These VMs are powered by the NVIDIA Tesla K80 card. 如需可用 VM 大小的詳細資訊, 請參閱Azure 中 Linux 虛擬機器的大小For information on available VM sizes, see Sizes for Linux virtual machines in Azure.

再次使用az aks node pool add命令來建立節點集區。Create a node pool using the az aks node pool add command again. 這次請指定名稱gpunodepool, 並使用--node-vm-size參數來指定Standard_NC6大小:This time, specify the name gpunodepool, and use the --node-vm-size parameter to specify the Standard_NC6 size:

az aks nodepool add \
    --resource-group myResourceGroup \
    --cluster-name myAKSCluster \
    --name gpunodepool \
    --node-count 1 \
    --node-vm-size Standard_NC6 \
    --no-wait

下列來自az aks node pool list命令的範例輸出顯示gpunodepool正在建立具有指定VmSize的節點:The following example output from the az aks node pool list command shows that gpunodepool is Creating nodes with the specified VmSize:

$ az aks nodepool list -g myResourceGroup --cluster-name myAKSCluster

[
  {
    ...
    "count": 1,
    ...
    "name": "gpunodepool",
    "orchestratorVersion": "1.13.9",
    ...
    "provisioningState": "Creating",
    ...
    "vmSize": "Standard_NC6",
    ...
  },
  {
    ...
    "count": 1,
    ...
    "name": "nodepool1",
    "orchestratorVersion": "1.13.9",
    ...
    "provisioningState": "Succeeded",
    ...
    "vmSize": "Standard_DS2_v2",
    ...
  }
]

成功建立gpunodepool需要幾分鐘的時間。It takes a few minutes for the gpunodepool to be successfully created.

使用污點和容差排程 podSchedule pods using taints and tolerations

您現在的叢集中有兩個節點集區-一開始建立預設節點集區, 以及以 GPU 為基礎的節點集區。You now have two node pools in your cluster - the default node pool initially created, and the GPU-based node pool. 使用 [ [kubectl] 取得節點] 命令來查看您叢集中的節點。Use the kubectl get nodes command to view the nodes in your cluster. 下列範例輸出顯示每個節點集區中的一個節點:The following example output shows one node in each node pool:

$ kubectl get nodes

NAME                                 STATUS   ROLES   AGE     VERSION
aks-gpunodepool-28993262-vmss000000  Ready    agent   4m22s   v1.13.9
aks-nodepool1-28993262-vmss000000    Ready    agent   115m    v1.13.9

Kubernetes 排程器可以使用污點和容差來限制可以在節點上執行的工作負載。The Kubernetes scheduler can use taints and tolerations to restrict what workloads can run on nodes.

  • 污點會套用至節點,該節點指示僅可以在其上排程特定的 pod。A taint is applied to a node that indicates only specific pods can be scheduled on them.
  • 然後容差會套用至容器,允許它們容許節點的污點。A toleration is then applied to a pod that allows them to tolerate a node's taint.

如需如何使用 advanced Kubernetes 排程功能的詳細資訊, 請參閱AKS 中 advanced 排程器功能的最佳做法For more information on how to use advanced Kubernetes scheduled features, see Best practices for advanced scheduler features in AKS

在此範例中, 使用kubectl 污點 node命令, 將污點套用至您的 GPU 型節點。In this example, apply a taint to your GPU-based node using the kubectl taint node command. 從上一個kubectl get nodes命令的輸出中指定 GPU 型節點的名稱。Specify the name of your GPU-based node from the output of the previous kubectl get nodes command. 污點會當做索引鍵: 值和排程選項來套用。The taint is applied as a key:value and then a scheduling option. 下列範例會使用sku = gpu配對, 並定義 pod, 否則會具有NoSchedule功能:The following example uses the sku=gpu pair and defines pods otherwise have the NoSchedule ability:

kubectl taint node aks-gpunodepool-28993262-vmss000000 sku=gpu:NoSchedule

下列基本範例 YAML 資訊清單會使用 toleration, 以允許 Kubernetes 排程器在 GPU 型節點上執行 NGINX pod。The following basic example YAML manifest uses a toleration to allow the Kubernetes scheduler to run an NGINX pod on the GPU-based node. 如需針對 MNIST 資料集執行 Tensorflow 作業的更適當但需要大量時間的範例, 請參閱在 AKS 上針對計算密集型工作負載使用 gpuFor a more appropriate, but time-intensive example to run a Tensorflow job against the MNIST dataset, see Use GPUs for compute-intensive workloads on AKS.

建立名為 gpu-toleration.yaml 的檔案,然後將下列範例 YAML 複製進來:Create a file named gpu-toleration.yaml and copy in the following example YAML:

apiVersion: v1
kind: Pod
metadata:
  name: mypod
spec:
  containers:
  - image: nginx:1.15.9
    name: mypod
    resources:
      requests:
        cpu: 100m
        memory: 128Mi
      limits:
        cpu: 1
        memory: 2G
  tolerations:
  - key: "sku"
    operator: "Equal"
    value: "gpu"
    effect: "NoSchedule"

使用kubectl apply -f gpu-toleration.yaml命令來排程 pod:Schedule the pod using the kubectl apply -f gpu-toleration.yaml command:

kubectl apply -f gpu-toleration.yaml

需要幾秒鐘的時間來排程 pod, 並提取 NGINX 的映射。It takes a few seconds to schedule the pod and pull the NGINX image. 使用 [ kubectl 描述 pod ] 命令來查看 pod 狀態。Use the kubectl describe pod command to view the pod status. 下列精簡的範例輸出顯示已套用sku = gpu: NoSchedule toleration。The following condensed example output shows the sku=gpu:NoSchedule toleration is applied. 在 [事件] 區段中, 排程器已將 pod 指派給aks-gpunodepool-28993262-vmss000000 GPU 型節點:In the events section, the scheduler has assigned the pod to the aks-gpunodepool-28993262-vmss000000 GPU-based node:

$ kubectl describe pod mypod

[...]
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
                 sku=gpu:NoSchedule
Events:
  Type    Reason     Age    From                                          Message
  ----    ------     ----   ----                                          -------
  Normal  Scheduled  4m48s  default-scheduler                             Successfully assigned default/mypod to aks-gpunodepool-28993262-vmss000000
  Normal  Pulling    4m47s  kubelet, aks-gpunodepool-28993262-vmss000000  pulling image "nginx:1.15.9"
  Normal  Pulled     4m43s  kubelet, aks-gpunodepool-28993262-vmss000000  Successfully pulled image "nginx:1.15.9"
  Normal  Created    4m40s  kubelet, aks-gpunodepool-28993262-vmss000000  Created container
  Normal  Started    4m40s  kubelet, aks-gpunodepool-28993262-vmss000000  Started container

只有套用此污點的 pod 可以在gpunodepool的節點上排程。Only pods that have this taint applied can be scheduled on nodes in gpunodepool. 任何其他 pod 都會在nodepool1節點集區中排程。Any other pod would be scheduled in the nodepool1 node pool. 如果您建立其他節點集區, 您可以使用其他污點和容差來限制可在這些節點資源上排程的 pod。If you create additional node pools, you can use additional taints and tolerations to limit what pods can be scheduled on those node resources.

使用 Resource Manager 範本管理節點集區Manage node pools using a Resource Manager template

當您使用 Azure Resource Manager 範本來建立和管理資源時, 您通常可以更新範本中的設定, 並重新部署以更新資源。When you use an Azure Resource Manager template to create and managed resources, you can typically update the settings in your template and redeploy to update the resource. 在 AKS 中使用節點集區時, 一旦建立 AKS 叢集之後, 就無法更新初始節點集區設定檔。With node pools in AKS, the initial node pool profile can't be updated once the AKS cluster has been created. 這個行為表示您無法更新現有的 Resource Manager 範本、對節點集區進行變更, 然後重新部署。This behavior means that you can't update an existing Resource Manager template, make a change to the node pools, and redeploy. 相反地, 您必須建立個別的 Resource Manager 範本, 只更新現有 AKS 叢集的代理程式組件區。Instead, you must create a separate Resource Manager template that updates only the agent pools for an existing AKS cluster.

建立範本 (例如) aks-agentpools.json , 並貼上下列範例資訊清單。Create a template such as aks-agentpools.json and paste the following example manifest. 這個範例範本會設定下列設定:This example template configures the following settings:

  • 更新名為myagentpoolLinux代理程式組件區, 以執行三個節點。Updates the Linux agent pool named myagentpool to run three nodes.
  • 設定節點集區中的節點, 以執行 Kubernetes 版本1.13.9Sets the nodes in the node pool to run Kubernetes version 1.13.9.
  • 將節點大小定義為Standard_DS2_v2Defines the node size as Standard_DS2_v2.

視需要編輯這些值, 視需要更新、新增或刪除節點集區:Edit these values as need to update, add, or delete node pools as needed:

{
  "$schema": "https://schema.management.azure.com/schemas/2015-01-01/deploymentTemplate.json#",
  "contentVersion": "1.0.0.0",
  "parameters": {
    "clusterName": {
      "type": "string",
      "metadata": {
        "description": "The name of your existing AKS cluster."
      }
    },
    "location": {
      "type": "string",
      "metadata": {
        "description": "The location of your existing AKS cluster."
      }
    },
    "agentPoolName": {
      "type": "string",
      "defaultValue": "myagentpool",
      "metadata": {
        "description": "The name of the agent pool to create or update."
      }
    },
    "vnetSubnetId": {
      "type": "string",
      "defaultValue": "",
      "metadata": {
        "description": "The Vnet subnet resource ID for your existing AKS cluster."
      }
    }
  },
  "variables": {
    "apiVersion": {
      "aks": "2019-04-01"
    },
    "agentPoolProfiles": {
      "maxPods": 30,
      "osDiskSizeGB": 0,
      "agentCount": 3,
      "agentVmSize": "Standard_DS2_v2",
      "osType": "Linux",
      "vnetSubnetId": "[parameters('vnetSubnetId')]"
    }
  },
  "resources": [
    {
      "apiVersion": "2019-04-01",
      "type": "Microsoft.ContainerService/managedClusters/agentPools",
      "name": "[concat(parameters('clusterName'),'/', parameters('agentPoolName'))]",
      "location": "[parameters('location')]",
      "properties": {
            "maxPods": "[variables('agentPoolProfiles').maxPods]",
            "osDiskSizeGB": "[variables('agentPoolProfiles').osDiskSizeGB]",
            "count": "[variables('agentPoolProfiles').agentCount]",
            "vmSize": "[variables('agentPoolProfiles').agentVmSize]",
            "osType": "[variables('agentPoolProfiles').osType]",
            "storageProfile": "ManagedDisks",
      "type": "VirtualMachineScaleSets",
            "vnetSubnetID": "[variables('agentPoolProfiles').vnetSubnetId]",
            "orchestratorVersion": "1.13.9"
      }
    }
  ]
}

使用az group deployment create命令來部署此範本, 如下列範例所示。Deploy this template using the az group deployment create command, as shown in the following example. 系統會提示您輸入現有的 AKS 叢集名稱和位置:You are prompted for the existing AKS cluster name and location:

az group deployment create \
    --resource-group myResourceGroup \
    --template-file aks-agentpools.json

視您在 Resource Manager 範本中定義的節點集區設定和作業而定, 可能需要幾分鐘的時間來更新 AKS 叢集。It may take a few minutes to update your AKS cluster depending on the node pool settings and operations you define in your Resource Manager template.

為節點集區中的每個節點指派一個公用 IPAssign a public IP per node in a node pool

注意

在預覽期間, 由於可能的負載平衡器規則與 VM 布建相衝突, 因此在 AKS (預覽) 中使用此功能與 Standard Load Balancer SKU 有一項限制。During preview there is a limitation of using this feature with Standard Load Balancer SKU in AKS (preview) due to possible load balancer rules conflicting with VM provisioning. 在預覽期間, 如果您需要為每個節點指派一個公用 IP, 請使用基本 LOAD BALANCER SKUWhile in preview use the Basic Load Balancer SKU if you need to assign a public IP per node.

AKS 節點不需要自己的公用 IP 位址進行通訊。AKS nodes do not require their own public IP addresses for communication. 不過, 某些情況下, 節點集區中的節點可能需要有自己的公用 IP 位址。However, some scenarios may require nodes in a node pool to have their own public IP addresses. 其中一個範例是遊戲, 其中主控台需要直接連線到雲端虛擬機器, 以將躍點降至最低。An example is gaming, where a console needs to make a direct connection to a cloud virtual machine to minimize hops. 這可以藉由註冊個別的預覽功能 [節點公用 IP (預覽)] 來達成。This can be achieved by registering for a separate preview feature, Node Public IP (preview).

az feature register --name NodePublicIPPreview --namespace Microsoft.ContainerService

成功註冊之後, 請遵循上述相同指示部署 Azure Resource Manager 範本, 並在 agentPoolProfiles 上新增下列布林值屬性 "enableNodePublicIP"。After successful registration, deploy an Azure Resource Manager template following the same instructions as above and adding the following boolean value property "enableNodePublicIP" on the agentPoolProfiles. 將此設true為, 預設會將它設定為false (如果未指定)。Set this to true as by default it will be set as false if not specified. 這是僅限建立時間的屬性, 而且需要最低 API 版本2019-06-01。This is a create-time only property and requires a minimum API version of 2019-06-01. 這可同時套用至 Linux 和 Windows 節點集區。This can be applied to both Linux and Windows node pools.

"agentPoolProfiles":[  
    {  
      "maxPods": 30,
      "osDiskSizeGB": 0,
      "agentCount": 3,
      "agentVmSize": "Standard_DS2_v2",
      "osType": "Linux",
      "vnetSubnetId": "[parameters('vnetSubnetId')]",
      "enableNodePublicIP":true
    }

清除資源Clean up resources

在本文中, 您已建立 AKS 叢集, 其中包含以 GPU 為基礎的節點。In this article, you created an AKS cluster that includes GPU-based nodes. 若要減少不必要的成本, 您可能會想要刪除gpunodepool或整個 AKS 叢集。To reduce unnecessary cost, you may want to delete the gpunodepool, or the whole AKS cluster.

若要刪除以 GPU 為基礎的節點集區, 請使用az aks nodepool delete命令, 如下列範例所示:To delete the GPU-based node pool, use the az aks nodepool delete command as shown in following example:

az aks nodepool delete -g myResourceGroup --cluster-name myAKSCluster --name gpunodepool

若要刪除叢集本身, 請使用az group delete命令來刪除 AKS 資源群組:To delete the cluster itself, use the az group delete command to delete the AKS resource group:

az group delete --name myResourceGroup --yes --no-wait

後續步驟Next steps

在本文中, 您已瞭解如何在 AKS 叢集中建立和管理多個節點集區。In this article, you learned how to create and manage multiple node pools in an AKS cluster. 如需如何跨節點集區控制 pod 的詳細資訊, 請參閱AKS 中先進排程器功能的最佳做法For more information about how to control pods across node pools, see Best practices for advanced scheduler features in AKS.

若要建立及使用 Windows Server 容器節點集區, 請參閱在 AKS 中建立 Windows server 容器To create and use Windows Server container node pools, see Create a Windows Server container in AKS.