Create an Azure Kubernetes Service (AKS) cluster that uses availability zones

An Azure Kubernetes Service (AKS) cluster distributes resources such as nodes and storage across logical sections of underlying Azure infrastructure. This deployment model when using availability zones, ensures nodes in a given availability zone are physically separated from those defined in another availability zone. AKS clusters deployed with multiple availability zones configured across a cluster provide a higher level of availability to protect against a hardware failure or a planned maintenance event.

By defining node pools in a cluster to span multiple zones, nodes in a given node pool are able to continue operating even if a single zone has gone down. Your applications can continue to be available even if there is a physical failure in a single datacenter if orchestrated to tolerate failure of a subset of nodes.

This article shows you how to create an AKS cluster and distribute the node components across availability zones.

Before you begin

You need the Azure CLI version 2.0.76 or later installed and configured. Run az --version to find the version. If you need to install or upgrade, see Install Azure CLI.

Limitations and region availability

AKS clusters can currently be created using availability zones in the following regions:

  • Central US
  • East US 2
  • East US
  • France Central
  • Japan East
  • North Europe
  • Southeast Asia
  • UK South
  • West Europe
  • West US 2

The following limitations apply when you create an AKS cluster using availability zones:

  • You can only define availability zones when the cluster or node pool is created.
  • Availability zone settings can't be updated after the cluster is created. You also can't update an existing, non-availability zone cluster to use availability zones.
  • The chosen node size (VM SKU) selected must be available across all availability zones selected.
  • Clusters with availability zones enabled require use of Azure Standard Load Balancers for distribution across zones. This load balancer type can only be defined at cluster create time. For more information and the limitations of the standard load balancer, see Azure load balancer standard SKU limitations.

Azure disks limitations

Volumes that use Azure managed disks are currently not zone-redundant resources. Volumes cannot be attached across zones and must be co-located in the same zone as a given node hosting the target pod.

If you must run stateful workloads, use node pool taints and tolerations in pod specs to group pod scheduling in the same zone as your disks. Alternatively, use network-based storage such as Azure Files that can attach to pods as they're scheduled between zones.

Overview of availability zones for AKS clusters

Availability zones are a high-availability offering that protects your applications and data from datacenter failures. Zones are unique physical locations within an Azure region. Each zone is made up of one or more datacenters equipped with independent power, cooling, and networking. To ensure resiliency, there's a minimum of three separate zones in all zone enabled regions. The physical separation of availability zones within a region protects applications and data from datacenter failures.

For more information, see What are availability zones in Azure?.

AKS clusters that are deployed using availability zones can distribute nodes across multiple zones within a single region. For example, a cluster in the East US 2 region can create nodes in all three availability zones in East US 2. This distribution of AKS cluster resources improves cluster availability as they're resilient to failure of a specific zone.

AKS node distribution across availability zones

If a single zone becomes unavailable, your applications continue to run if the cluster is spread across multiple zones.

Create an AKS cluster across availability zones

When you create a cluster using the az aks create command, the --zones parameter defines which zones agent nodes are deployed into. The control plane components such as etcd is spread across three zones if you define the --zones parameter at cluster creation time. The specific zones which the control plane components are spread across are independent of what explicit zones are selected for the initial node pool.

If you don't define any zones for the default agent pool when you create an AKS cluster, control plane components are not guaranteed to spread across availability zones. You can add additional node pools using the az aks nodepool add command and specify --zones for new nodes, but it will not change how the control plane has been spread across zones. Availability zone settings can only be defined at cluster or node pool create-time.

The following example creates an AKS cluster named myAKSCluster in the resource group named myResourceGroup. A total of 3 nodes are created - one agent in zone 1, one in 2, and then one in 3.

az group create --name myResourceGroup --location eastus2

az aks create \
    --resource-group myResourceGroup \
    --name myAKSCluster \
    --generate-ssh-keys \
    --vm-set-type VirtualMachineScaleSets \
    --load-balancer-sku standard \
    --node-count 3 \
    --zones 1 2 3

It takes a few minutes to create the AKS cluster.

When deciding what zone a new node should belong to, a given AKS node pool will use a best effort zone balancing offered by underlying Azure Virtual Machine Scale Sets. A given AKS node pool is considered "balanced" if each zone has the same number of VMs or +- 1 VM in all other zones for the scale set.

Verify node distribution across zones

When the cluster is ready, list the agent nodes in the scale set to see what availability zone they're deployed in.

First, get the AKS cluster credentials using the az aks get-credentials command:

az aks get-credentials --resource-group myResourceGroup --name myAKSCluster

Next, use the kubectl describe command to list the nodes in the cluster and filter on the failure-domain.beta.kubernetes.io/zone value. The following example is for a Bash shell.

kubectl describe nodes | grep -e "Name:" -e "failure-domain.beta.kubernetes.io/zone"

The following example output shows the three nodes distributed across the specified region and availability zones, such as eastus2-1 for the first availability zone and eastus2-2 for the second availability zone:

Name:       aks-nodepool1-28993262-vmss000000
            failure-domain.beta.kubernetes.io/zone=eastus2-1
Name:       aks-nodepool1-28993262-vmss000001
            failure-domain.beta.kubernetes.io/zone=eastus2-2
Name:       aks-nodepool1-28993262-vmss000002
            failure-domain.beta.kubernetes.io/zone=eastus2-3

As you add additional nodes to an agent pool, the Azure platform automatically distributes the underlying VMs across the specified availability zones.

Note that in newer Kubernetes versions (1.17.0 and later), AKS is using the newer label topology.kubernetes.io/zone in addition to the deprecated failure-domain.beta.kubernetes.io/zone.

Verify pod distribution across zones

As documented in Well-Known Labels, Annotations and Taints, Kubernetes uses the failure-domain.beta.kubernetes.io/zone label to automatically distribute pods in a replication controller or service across the different zones available. In order to test this, you can scale up your cluster from 3 to 5 nodes, to verify correct pod spreading:

az aks scale \
    --resource-group myResourceGroup \
    --name myAKSCluster \
    --node-count 5

When the scale operation completes after a few minutes, the command kubectl describe nodes | grep -e "Name:" -e "failure-domain.beta.kubernetes.io/zone" in a Bash shell should give an output similar to this sample:

Name:       aks-nodepool1-28993262-vmss000000
            failure-domain.beta.kubernetes.io/zone=eastus2-1
Name:       aks-nodepool1-28993262-vmss000001
            failure-domain.beta.kubernetes.io/zone=eastus2-2
Name:       aks-nodepool1-28993262-vmss000002
            failure-domain.beta.kubernetes.io/zone=eastus2-3
Name:       aks-nodepool1-28993262-vmss000003
            failure-domain.beta.kubernetes.io/zone=eastus2-1
Name:       aks-nodepool1-28993262-vmss000004
            failure-domain.beta.kubernetes.io/zone=eastus2-2

We now have two additional nodes in zones 1 and 2. You can deploy an application consisting of three replicas. We will use NGINX as an example:

kubectl run nginx --image=nginx --replicas=3

By viewing nodes where your pods are running, you see pods are running on the nodes corresponding to three different availability zones. For example, with the command kubectl describe pod | grep -e "^Name:" -e "^Node:" in a Bash shell you would get an output similar to this:

Name:         nginx-6db489d4b7-ktdwg
Node:         aks-nodepool1-28993262-vmss000000/10.240.0.4
Name:         nginx-6db489d4b7-v7zvj
Node:         aks-nodepool1-28993262-vmss000002/10.240.0.6
Name:         nginx-6db489d4b7-xz6wj
Node:         aks-nodepool1-28993262-vmss000004/10.240.0.8

As you can see from the previous output, the first pod is running on node 0, which is located in the availability zone eastus2-1. The second pod is running on node 2, which corresponds to eastus2-3, and the third one in node 4, in eastus2-2. Without any additional configuration, Kubernetes is spreading the pods correctly across all three availability zones.

Next steps

This article detailed how to create an AKS cluster that uses availability zones. For more considerations on highly available clusters, see Best practices for business continuity and disaster recovery in AKS.