Preview - Create and manage multiple node pools for a cluster in Azure Kubernetes Service (AKS)

In Azure Kubernetes Service (AKS), nodes of the same configuration are grouped together into node pools. These node pools contain the underlying VMs that run your applications. The initial number of nodes and their size (SKU) are defined when you create an AKS cluster, which creates a default node pool. To support applications that have different compute or storage demands, you can create additional node pools. For example, use these additional node pools to provide GPUs for compute-intensive applications, or access to high-performance SSD storage.

Note

This feature enables higher control over how to create and manage multiple node pools. As a result, separate commands are required for create/update/delete. Previously cluster operations through az aks create or az aks update used the managedCluster API and were the only option to change your control plane and a single node pool. This feature exposes a separate operation set for agent pools through the agentPool API and require use of the az aks nodepool command set to execute operations on an individual node pool.

This article shows you how to create and manage multiple node pools in an AKS cluster. This feature is currently in preview.

Important

AKS preview features are self-service opt-in. Previews are provided "as-is" and "as available" and are excluded from the service level agreements and limited warranty. AKS Previews are partially covered by customer support on best effort basis. As such, these features are not meant for production use. For additional infromation, please see the following support articles:

Before you begin

You need the Azure CLI version 2.0.61 or later installed and configured. Run az --version to find the version. If you need to install or upgrade, see Install Azure CLI.

Install aks-preview CLI extension

To use multiple node pools, you need the aks-preview CLI extension version 0.4.16 or higher. Install the aks-preview Azure CLI extension using the az extension add command, then check for any available updates using the az extension update command::

# Install the aks-preview extension
az extension add --name aks-preview

# Update the extension to make sure you have the latest version installed
az extension update --name aks-preview

Register multiple node pool feature provider

To create an AKS cluster that can use multiple node pools, first enable a feature flag on your subscription. Register the MultiAgentpoolPreview feature flag using the az feature register command as shown in the following example:

Caution

When you register a feature on a subscription, you can't currently un-register that feature. After you enable some preview features, defaults may be used for all AKS clusters then created in the subscription. Don't enable preview features on production subscriptions. Use a separate subscription to test preview features and gather feedback.

az feature register --name MultiAgentpoolPreview --namespace Microsoft.ContainerService

Note

Any AKS cluster you create after you've successfully registered the MultiAgentpoolPreview use this preview cluster experience. To continue to create regular, fully-supported clusters, don't enable preview features on production subscriptions. Use a separate test or development Azure subscription for testing preview features.

It takes a few minutes for the status to show Registered. You can check on the registration status using the az feature list command:

az feature list -o table --query "[?contains(name, 'Microsoft.ContainerService/MultiAgentpoolPreview')].{Name:name,State:properties.state}"

When ready, refresh the registration of the Microsoft.ContainerService resource provider using the az provider register command:

az provider register --namespace Microsoft.ContainerService

Limitations

The following limitations apply when you create and manage AKS clusters that support multiple node pools:

  • Multiple node pools are only available for clusters created after you've successfully registered the MultiAgentpoolPreview feature for your subscription. You can't add or manage node pools with an existing AKS cluster created before this feature was successfully registered.
  • You can't delete the default (first) node pool.
  • The HTTP application routing add-on can't be used.
  • You can't add or delete node pools using an existing Resource Manager template as with most operations. Instead, use a separate Resource Manager template to make changes to node pools in an AKS cluster.
  • The name of a node pool must start with a lowercase letter and can only contain alphanumeric characters. For Linux node pools the length must be between 1 and 12 characters, for Windows node pools the length must be between 1 and 6 characters.

While this feature is in preview, the following additional limitations apply:

  • The AKS cluster can have a maximum of eight node pools.
  • The AKS cluster can have a maximum of 400 nodes across those eight node pools.
  • All node pools must reside in the same subnet.

Create an AKS cluster

To get started, create an AKS cluster with a single node pool. The following example uses the az group create command to create a resource group named myResourceGroup in the eastus region. An AKS cluster named myAKSCluster is then created using the az aks create command. A --kubernetes-version of 1.13.10 is used to show how to update a node pool in a following step. You can specify any supported Kubernetes version.

It is highly recommended to use the Standard SKU load balancer when utilizing multiple node pools. Read this document to learn more about using Standard Load Balancers with AKS.

# Create a resource group in East US
az group create --name myResourceGroup --location eastus

# Create a basic single-node AKS cluster
az aks create \
    --resource-group myResourceGroup \
    --name myAKSCluster \
    --vm-set-type VirtualMachineScaleSets \
    --node-count 2 \
    --generate-ssh-keys \
    --kubernetes-version 1.13.10 \
    --load-balancer-sku standard

It takes a few minutes to create the cluster.

Note

To ensure your cluster operates reliably, you should run at least 2 (two) nodes in the default node pool, as essential system services are running across this node pool.

When the cluster is ready, use the az aks get-credentials command to get the cluster credentials for use with kubectl:

az aks get-credentials --resource-group myResourceGroup --name myAKSCluster

Add a node pool

The cluster created in the previous step has a single node pool. Let's add a second node pool using the az aks nodepool add command. The following example creates a node pool named mynodepool that runs 3 nodes:

az aks nodepool add \
    --resource-group myResourceGroup \
    --cluster-name myAKSCluster \
    --name mynodepool \
    --node-count 3 \
    --kubernetes-version 1.12.7

Note

The name of a node pool must start with a lowercase letter and can only contain alphanumeric characters. For Linux node pools the length must be between 1 and 12 characters, for Windows node pools the length must be between 1 and 6 characters.

To see the status of your node pools, use the az aks node pool list command and specify your resource group and cluster name:

az aks nodepool list --resource-group myResourceGroup --cluster-name myAKSCluster

The following example output shows that mynodepool has been successfully created with three nodes in the node pool. When the AKS cluster was created in the previous step, a default nodepool1 was created with a node count of 2.

$ az aks nodepool list --resource-group myResourceGroup --cluster-name myAKSCluster

[
  {
    ...
    "count": 3,
    ...
    "name": "mynodepool",
    "orchestratorVersion": "1.12.7",
    ...
    "vmSize": "Standard_DS2_v2",
    ...
  },
  {
    ...
    "count": 2,
    ...
    "name": "nodepool1",
    "orchestratorVersion": "1.13.10",
    ...
    "vmSize": "Standard_DS2_v2",
    ...
  }
]

Tip

If no OrchestratorVersion or VmSize is specified when you add a node pool, the nodes are created based on the defaults for the AKS cluster. In this example, that was Kubernetes version 1.13.10 and node size of Standard_DS2_v2.

Upgrade a node pool

Note

Upgrade and scale operations on a cluster or node pool cannot occur simultaneously, if attempted an error is returned. Instead, each operation type must complete on the target resource prior to the next request on that same resource. Read more about this on our troubleshooting guide.

When your AKS cluster was initially created in the first step, a --kubernetes-version of 1.13.10 was specified. This set the Kubernetes version for both the control plane and the default node pool. The commands in this section explain how to upgrade a single specific node pool.

The relationship between upgrading the Kubernetes version of the control plane and the node pool are explained in the section below.

Note

The node pool OS image version is tied to the Kubernetes version of the cluster. You will only get OS image upgrades, following a cluster upgrade.

Since there are two node pools in this example, we must use az aks nodepool upgrade to upgrade a node pool. Let's upgrade the mynodepool to Kubernetes 1.13.10. Use the az aks nodepool upgrade command to upgrade the node pool, as shown in the following example:

az aks nodepool upgrade \
    --resource-group myResourceGroup \
    --cluster-name myAKSCluster \
    --name mynodepool \
    --kubernetes-version 1.13.10 \
    --no-wait

List the status of your node pools again using the az aks node pool list command. The following example shows that mynodepool is in the Upgrading state to 1.13.10:

$ az aks nodepool list -g myResourceGroup --cluster-name myAKSCluster

[
  {
    ...
    "count": 3,
    ...
    "name": "mynodepool",
    "orchestratorVersion": "1.13.10",
    ...
    "provisioningState": "Upgrading",
    ...
    "vmSize": "Standard_DS2_v2",
    ...
  },
  {
    ...
    "count": 2,
    ...
    "name": "nodepool1",
    "orchestratorVersion": "1.13.10",
    ...
    "provisioningState": "Succeeded",
    ...
    "vmSize": "Standard_DS2_v2",
    ...
  }
]

It takes a few minutes to upgrade the nodes to the specified version.

As a best practice, you should upgrade all node pools in an AKS cluster to the same Kubernetes version. The default behavior of az aks upgrade is to upgrade all node pools together with the control plane to achieve this alignment. The ability to upgrade individual node pools lets you perform a rolling upgrade and schedule pods between node pools to maintain application uptime within the above constraints mentioned.

Upgrade a cluster control plane with multiple node pools

Note

Kubernetes uses the standard Semantic Versioning versioning scheme. The version number is expressed as x.y.z, where x is the major version, y is the minor version, and z is the patch version. For example, in version 1.12.6, 1 is the major version, 12 is the minor version and 6 is the patch version. The Kubernetes version of the control plane as well as the initial node pool is set during cluster creation. All additional node pools have their Kubernetes version set when they are added to the cluster. The Kubernetes versions may differ between node pools as well as between a node pool and the control plane, but the follow restrictions apply:

  • The node pool version must have the same major version as the control plane.
  • The node pool version may be one minor version less than the control plane version.
  • The node pool version may be any patch version as long as the other two constraints are followed.

An AKS cluster has two cluster resource objects with Kubernetes versions associated. The first is a control plane Kubernetes version. The second is an agent pool with a Kubernetes version. A control plane maps to one or many node pools. The behavior of an upgrade operation depends on which Azure CLI command is used.

  1. Upgrading the control plane requires using az aks upgrade
    • This upgrades the control plane version and all node pools in the cluster
    • By passing az aks upgrade with the --control-plane-only flag only the cluster control plane gets upgraded and none of the associated node pools are changed. The --control-plane-only flag is available in AKS-preview extension v0.4.16 or higher.
  2. Upgrading individual node pools requires using az aks nodepool upgrade
    • This upgrades only the target node pool with the specified Kubernetes version

The relationship between Kubernetes versions held by node pools must also follow a set of rules.

  1. You cannot downgrade the control plane nor a node pool Kubernetes version.
  2. If a node pool Kubernetes version is not specified, behavior depends on the client being used. For declaration in ARM template the existing version defined for the node pool is used, if none is set the control plane version is used.
  3. You can either upgrade or scale a control plane or node pool at a given time, you cannot submit both operations simultaneously.
  4. A node pool Kubernetes version must be the same major version as the control plane.
  5. A node pool Kubernetes version can be at most two (2) minor versions less than the control plane, never greater.
  6. A node pool can be any Kubernetes patch version less than or equal to the control plane, never greater.

Scale a node pool manually

As your application workload demands change, you may need to scale the number of nodes in a node pool. The number of nodes can be scaled up or down.

To scale the number of nodes in a node pool, use the az aks node pool scale command. The following example scales the number of nodes in mynodepool to 5:

az aks nodepool scale \
    --resource-group myResourceGroup \
    --cluster-name myAKSCluster \
    --name mynodepool \
    --node-count 5 \
    --no-wait

List the status of your node pools again using the az aks node pool list command. The following example shows that mynodepool is in the Scaling state with a new count of 5 nodes:

$ az aks nodepool list -g myResourceGroupPools --cluster-name myAKSCluster

[
  {
    ...
    "count": 5,
    ...
    "name": "mynodepool",
    "orchestratorVersion": "1.13.10",
    ...
    "provisioningState": "Scaling",
    ...
    "vmSize": "Standard_DS2_v2",
    ...
  },
  {
    ...
    "count": 2,
    ...
    "name": "nodepool1",
    "orchestratorVersion": "1.13.10",
    ...
    "provisioningState": "Succeeded",
    ...
    "vmSize": "Standard_DS2_v2",
    ...
  }
]

It takes a few minutes for the scale operation to complete.

Scale a specific node pool automatically by enabling the cluster autoscaler

AKS offers a separate feature in preview to automatically scale node pools with a feature called the cluster autoscaler. This feature is an AKS add-on that can be enabled per node pool with unique minimum and maximum scale counts per node pool. Learn how to use the cluster autoscaler per node pool.

Delete a node pool

If you no longer need a pool, you can delete it and remove the underlying VM nodes. To delete a node pool, use the az aks node pool delete command and specify the node pool name. The following example deletes the mynoodepool created in the previous steps:

Caution

There are no recovery options for data loss that may occur when you delete a node pool. If pods can't be scheduled on other node pools, those applications are unavailable. Make sure you don't delete a node pool when in-use applications don't have data backups or the ability to run on other node pools in your cluster.

az aks nodepool delete -g myResourceGroup --cluster-name myAKSCluster --name mynodepool --no-wait

The following example output from the az aks node pool list command shows that mynodepool is in the Deleting state:

$ az aks nodepool list -g myResourceGroup --cluster-name myAKSCluster

[
  {
    ...
    "count": 5,
    ...
    "name": "mynodepool",
    "orchestratorVersion": "1.13.10",
    ...
    "provisioningState": "Deleting",
    ...
    "vmSize": "Standard_DS2_v2",
    ...
  },
  {
    ...
    "count": 2,
    ...
    "name": "nodepool1",
    "orchestratorVersion": "1.13.10",
    ...
    "provisioningState": "Succeeded",
    ...
    "vmSize": "Standard_DS2_v2",
    ...
  }
]

It takes a few minutes to delete the nodes and the node pool.

Specify a VM size for a node pool

In the previous examples to create a node pool, a default VM size was used for the nodes created in the cluster. A more common scenario is for you to create node pools with different VM sizes and capabilities. For example, you may create a node pool that contains nodes with large amounts of CPU or memory, or a node pool that provides GPU support. In the next step, you use taints and tolerations to tell the Kubernetes scheduler how to limit access to pods that can run on these nodes.

In the following example, create a GPU-based node pool that uses the Standard_NC6 VM size. These VMs are powered by the NVIDIA Tesla K80 card. For information on available VM sizes, see Sizes for Linux virtual machines in Azure.

Create a node pool using the az aks node pool add command again. This time, specify the name gpunodepool, and use the --node-vm-size parameter to specify the Standard_NC6 size:

az aks nodepool add \
    --resource-group myResourceGroup \
    --cluster-name myAKSCluster \
    --name gpunodepool \
    --node-count 1 \
    --node-vm-size Standard_NC6 \
    --no-wait

The following example output from the az aks node pool list command shows that gpunodepool is Creating nodes with the specified VmSize:

$ az aks nodepool list -g myResourceGroup --cluster-name myAKSCluster

[
  {
    ...
    "count": 1,
    ...
    "name": "gpunodepool",
    "orchestratorVersion": "1.13.10",
    ...
    "provisioningState": "Creating",
    ...
    "vmSize": "Standard_NC6",
    ...
  },
  {
    ...
    "count": 2,
    ...
    "name": "nodepool1",
    "orchestratorVersion": "1.13.10",
    ...
    "provisioningState": "Succeeded",
    ...
    "vmSize": "Standard_DS2_v2",
    ...
  }
]

It takes a few minutes for the gpunodepool to be successfully created.

Schedule pods using taints and tolerations

You now have two node pools in your cluster - the default node pool initially created, and the GPU-based node pool. Use the kubectl get nodes command to view the nodes in your cluster. The following example output shows the nodes:

$ kubectl get nodes

NAME                                 STATUS   ROLES   AGE     VERSION
aks-gpunodepool-28993262-vmss000000  Ready    agent   4m22s   v1.13.10
aks-nodepool1-28993262-vmss000000    Ready    agent   115m    v1.13.10

The Kubernetes scheduler can use taints and tolerations to restrict what workloads can run on nodes.

  • A taint is applied to a node that indicates only specific pods can be scheduled on them.
  • A toleration is then applied to a pod that allows them to tolerate a node's taint.

For more information on how to use advanced Kubernetes scheduled features, see Best practices for advanced scheduler features in AKS

In this example, apply a taint to your GPU-based node using the kubectl taint node command. Specify the name of your GPU-based node from the output of the previous kubectl get nodes command. The taint is applied as a key:value and then a scheduling option. The following example uses the sku=gpu pair and defines pods otherwise have the NoSchedule ability:

kubectl taint node aks-gpunodepool-28993262-vmss000000 sku=gpu:NoSchedule

The following basic example YAML manifest uses a toleration to allow the Kubernetes scheduler to run an NGINX pod on the GPU-based node. For a more appropriate, but time-intensive example to run a Tensorflow job against the MNIST dataset, see Use GPUs for compute-intensive workloads on AKS.

Create a file named gpu-toleration.yaml and copy in the following example YAML:

apiVersion: v1
kind: Pod
metadata:
  name: mypod
spec:
  containers:
  - image: nginx:1.15.9
    name: mypod
    resources:
      requests:
        cpu: 100m
        memory: 128Mi
      limits:
        cpu: 1
        memory: 2G
  tolerations:
  - key: "sku"
    operator: "Equal"
    value: "gpu"
    effect: "NoSchedule"

Schedule the pod using the kubectl apply -f gpu-toleration.yaml command:

kubectl apply -f gpu-toleration.yaml

It takes a few seconds to schedule the pod and pull the NGINX image. Use the kubectl describe pod command to view the pod status. The following condensed example output shows the sku=gpu:NoSchedule toleration is applied. In the events section, the scheduler has assigned the pod to the aks-gpunodepool-28993262-vmss000000 GPU-based node:

$ kubectl describe pod mypod

[...]
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
                 sku=gpu:NoSchedule
Events:
  Type    Reason     Age    From                                          Message
  ----    ------     ----   ----                                          -------
  Normal  Scheduled  4m48s  default-scheduler                             Successfully assigned default/mypod to aks-gpunodepool-28993262-vmss000000
  Normal  Pulling    4m47s  kubelet, aks-gpunodepool-28993262-vmss000000  pulling image "nginx:1.15.9"
  Normal  Pulled     4m43s  kubelet, aks-gpunodepool-28993262-vmss000000  Successfully pulled image "nginx:1.15.9"
  Normal  Created    4m40s  kubelet, aks-gpunodepool-28993262-vmss000000  Created container
  Normal  Started    4m40s  kubelet, aks-gpunodepool-28993262-vmss000000  Started container

Only pods that have this taint applied can be scheduled on nodes in gpunodepool. Any other pod would be scheduled in the nodepool1 node pool. If you create additional node pools, you can use additional taints and tolerations to limit what pods can be scheduled on those node resources.

Manage node pools using a Resource Manager template

When you use an Azure Resource Manager template to create and managed resources, you can typically update the settings in your template and redeploy to update the resource. With node pools in AKS, the initial node pool profile can't be updated once the AKS cluster has been created. This behavior means that you can't update an existing Resource Manager template, make a change to the node pools, and redeploy. Instead, you must create a separate Resource Manager template that updates only the agent pools for an existing AKS cluster.

Create a template such as aks-agentpools.json and paste the following example manifest. This example template configures the following settings:

  • Updates the Linux agent pool named myagentpool to run three nodes.
  • Sets the nodes in the node pool to run Kubernetes version 1.13.10.
  • Defines the node size as Standard_DS2_v2.

Edit these values as need to update, add, or delete node pools as needed:

{
    "$schema": "https://schema.management.azure.com/schemas/2015-01-01/deploymentTemplate.json#",
    "contentVersion": "1.0.0.0",
    "parameters": {
        "clusterName": {
            "type": "string",
            "metadata": {
                "description": "The name of your existing AKS cluster."
            }
        },
        "location": {
            "type": "string",
            "metadata": {
                "description": "The location of your existing AKS cluster."
            }
        },
        "agentPoolName": {
            "type": "string",
            "defaultValue": "myagentpool",
            "metadata": {
                "description": "The name of the agent pool to create or update."
            }
        },
        "vnetSubnetId": {
            "type": "string",
            "defaultValue": "",
            "metadata": {
                "description": "The Vnet subnet resource ID for your existing AKS cluster."
            }
        }
    },
    "variables": {
        "apiVersion": {
            "aks": "2019-04-01"
        },
        "agentPoolProfiles": {
            "maxPods": 30,
            "osDiskSizeGB": 0,
            "agentCount": 3,
            "agentVmSize": "Standard_DS2_v2",
            "osType": "Linux",
            "vnetSubnetId": "[parameters('vnetSubnetId')]"
        }
    },
    "resources": [
        {
            "apiVersion": "2019-04-01",
            "type": "Microsoft.ContainerService/managedClusters/agentPools",
            "name": "[concat(parameters('clusterName'),'/', parameters('agentPoolName'))]",
            "location": "[parameters('location')]",
            "properties": {
                "maxPods": "[variables('agentPoolProfiles').maxPods]",
                "osDiskSizeGB": "[variables('agentPoolProfiles').osDiskSizeGB]",
                "count": "[variables('agentPoolProfiles').agentCount]",
                "vmSize": "[variables('agentPoolProfiles').agentVmSize]",
                "osType": "[variables('agentPoolProfiles').osType]",
                "storageProfile": "ManagedDisks",
                "type": "VirtualMachineScaleSets",
                "vnetSubnetID": "[variables('agentPoolProfiles').vnetSubnetId]",
                "orchestratorVersion": "1.13.10"
            }
        }
    ]
}

Deploy this template using the az group deployment create command, as shown in the following example. You are prompted for the existing AKS cluster name and location:

az group deployment create \
    --resource-group myResourceGroup \
    --template-file aks-agentpools.json

It may take a few minutes to update your AKS cluster depending on the node pool settings and operations you define in your Resource Manager template.

Assign a public IP per node in a node pool

Warning

During the preview of assigning a public IP per node, it cannot be used with the Standard Load Balancer SKU in AKS due to possible load balancer rules conflicting with VM provisioning. While in preview you must use the Basic Load Balancer SKU if you need to assign a public IP per node.

AKS nodes do not require their own public IP addresses for communication. However, some scenarios may require nodes in a node pool to have their own public IP addresses. An example is gaming, where a console needs to make a direct connection to a cloud virtual machine to minimize hops. This can be achieved by registering for a separate preview feature, Node Public IP (preview).

az feature register --name NodePublicIPPreview --namespace Microsoft.ContainerService

After successful registration, deploy an Azure Resource Manager template following the same instructions as above and adding the following boolean value property "enableNodePublicIP" on the agentPoolProfiles. Set this to true as by default it is set as false if not specified. This is a create-time only property and requires a minimum API version of 2019-06-01. This can be applied to both Linux and Windows node pools.

"agentPoolProfiles":[  
    {  
      "maxPods": 30,
      "osDiskSizeGB": 0,
      "agentCount": 3,
      "agentVmSize": "Standard_DS2_v2",
      "osType": "Linux",
      "vnetSubnetId": "[parameters('vnetSubnetId')]",
      "enableNodePublicIP":true
    }

Clean up resources

In this article, you created an AKS cluster that includes GPU-based nodes. To reduce unnecessary cost, you may want to delete the gpunodepool, or the whole AKS cluster.

To delete the GPU-based node pool, use the az aks nodepool delete command as shown in following example:

az aks nodepool delete -g myResourceGroup --cluster-name myAKSCluster --name gpunodepool

To delete the cluster itself, use the az group delete command to delete the AKS resource group:

az group delete --name myResourceGroup --yes --no-wait

Next steps

In this article, you learned how to create and manage multiple node pools in an AKS cluster. For more information about how to control pods across node pools, see Best practices for advanced scheduler features in AKS.

To create and use Windows Server container node pools, see Create a Windows Server container in AKS.