Scalability considerations for Azure Kubernetes Service environments

Azure Kubernetes Service (AKS) can be scaled-in and out depending upon infrastructure needs (requiring more or less capacity), adding node pools with special capabilities like GPU, or application needs, in this case you have several factors like number and rate of concurrent connections, number of requests, back-end latencies on AKS applications.

The most common scalability options for AKS are the cluster autoscaler and the horizontal pod autoscaler. The cluster autoscaler adjusts the number of nodes based on the requested compute resources in the node pool. The horizontal pod autoscaler (HPA) adjusts the number of pods in a deployment depending on CPU utilization or other configured metrics.

Design considerations

Here are some crucial factors to consider:

  • Is rapid scalability a requirement for your application (no-time-to-wait)?

    • To have a quick provisioning of pods use virtual nodes, they are only supported with Linux nodes/pods.
  • Is the workload non-time sensitive and can handle interruptions? Consider the use of Spot VMs

  • Is the underlying infrastructure (network plug-in, IP ranges, subscription limits, quotas, and so on) capable to scale out?

  • Consider to automate scalability

    • You can enable cluster autoscaling to scale the number of nodes. Consider cluster autoscaling and scale-to-zero
    • Horizontal pod autoscaler automatically scales the number of pods.
  • Consider scalability with multizone and node pools

    • When creating node pools consider to set Availability Zones with AKS.
    • Consider to use multiple node pools to support applications with different requirements.
    • Scale node pools with cluster autoscaler.
    • You can scale to zero the user node pools. See the limitations.

Design recommendations

Follow these best practices for your design:

  • Use virtual machine scale sets (VMSS), which are required for scenarios including autoscaling, multiple node pools, and Windows node pool support.
    • Don't manually enable or edit VMSS scalability settings in the Azure portal or using the Azure CLI. Instead, use the cluster autoscaler.
  • If you need fast burst autoscaling choose to burst from AKS cluster using Azure Container Instances and virtual nodes for rapid and infinite scalability and per-second billing.
  • Use cluster autoscaler and scale-to-zero for predictable scalability using VM-based worker nodes.
  • Enable cluster autoscaler to meet application demands.
  • Enable horizontal pod autoscaler (HPA) to mitigate the busy hours of your application.
    • All your containers and pods must have resource requests and limits defined.
    • HPA automatically scales the number of pods based on observed resource limits CPU/memory or custom metrics.
  • Enable Azure Monitor for containers and live monitoring to monitor the cluster and workload utilization.
  • Use multiple node pools when your applications have different resource requirements.
  • Consider Spot VM-based node pools for non-time-sensitive workloads that can handle interruptions and evictions.