Kubernetes core concepts for Azure Kubernetes Service (AKS)
Application development continues to move toward a container-based approach, increasing our need to orchestrate and manage resources. As the leading platform, Kubernetes provides reliable scheduling of fault-tolerant application workloads. Azure Kubernetes Service (AKS), a managed Kubernetes offering, further simplifies container-based application deployment and management.
This article introduces:
- Core Kubernetes infrastructure components:
- control plane
- node pools
- Workload resources:
- How to group resources into namespaces.
What is Kubernetes?
Kubernetes is a rapidly evolving platform that manages container-based applications and their associated networking and storage components. Kubernetes focuses on the application workloads, not the underlying infrastructure components. Kubernetes provides a declarative approach to deployments, backed by a robust set of APIs for management operations.
You can build and run modern, portable, microservices-based applications, using Kubernetes to orchestrate and manage the availability of the application components. Kubernetes supports both stateless and stateful applications as teams progress through the adoption of microservices-based applications.
As an open platform, Kubernetes allows you to build your applications with your preferred programming language, OS, libraries, or messaging bus. Existing continuous integration and continuous delivery (CI/CD) tools can integrate with Kubernetes to schedule and deploy releases.
AKS provides a managed Kubernetes service that reduces the complexity of deployment and core management tasks, like upgrade coordination. The Azure platform manages the AKS control plane, and you only pay for the AKS nodes that run your applications. AKS is built on top of the open-source Azure Kubernetes Service Engine: aks-engine.
Kubernetes cluster architecture
A Kubernetes cluster is divided into two components:
- Control plane: provides the core Kubernetes services and orchestration of application workloads.
- Nodes: run your application workloads.
When you create an AKS cluster, a control plane is automatically created and configured. This control plane is provided at no cost as a managed Azure resource abstracted from the user. You only pay for the nodes attached to the AKS cluster. The control plane and its resources reside only on the region where you created the cluster.
The control plane includes the following core Kubernetes components:
|kube-apiserver||The API server is how the underlying Kubernetes APIs are exposed. This component provides the interaction for management tools, such as
|etcd||To maintain the state of your Kubernetes cluster and configuration, the highly available etcd is a key value store within Kubernetes.|
|kube-scheduler||When you create or scale applications, the Scheduler determines what nodes can run the workload and starts them.|
|kube-controller-manager||The Controller Manager oversees a number of smaller Controllers that perform actions such as replicating pods and handling node operations.|
AKS provides a single-tenant control plane, with a dedicated API server, scheduler, etc. You define the number and size of the nodes, and the Azure platform configures the secure communication between the control plane and nodes. Interaction with the control plane occurs through Kubernetes APIs, such as
kubectl or the Kubernetes dashboard.
While you don't need to configure components (like a highly available etcd store) with this managed control plane, you can't access the control plane directly. Kubernetes control plane and node upgrades are orchestrated through the Azure CLI or Azure portal. To troubleshoot possible issues, you can review the control plane logs through Azure Monitor logs.
To configure or directly access a control plane, deploy your own Kubernetes cluster using aks-engine.
For associated best practices, see Best practices for cluster security and upgrades in AKS.
Nodes and node pools
To run your applications and supporting services, you need a Kubernetes node. An AKS cluster has at least one node, an Azure virtual machine (VM) that runs the Kubernetes node components and container runtime.
||The Kubernetes agent that processes the orchestration requests from the control plane and scheduling of running the requested containers.|
|kube-proxy||Handles virtual networking on each node. The proxy routes network traffic and manages IP addressing for services and pods.|
|container runtime||Allows containerized applications to run and interact with additional resources, such as the virtual network and storage. AKS clusters using Kubernetes version 1.19+ for Linux node pools use
The Azure VM size for your nodes defines the storage CPUs, memory, size, and type available (such as high-performance SSD or regular HDD). Plan the node size around whether your applications may require large amounts of CPU and memory or high-performance storage. Scale out the number of nodes in your AKS cluster to meet demand.
In AKS, the VM image for your cluster's nodes is based on Ubuntu Linux or Windows Server 2019. When you create an AKS cluster or scale out the number of nodes, the Azure platform automatically creates and configures the requested number of VMs. Agent nodes are billed as standard VMs, so any VM size discounts (including Azure reservations) are automatically applied.
Deploy your own Kubernetes cluster with aks-engine if using a different host OS, container runtime, or including different custom packages. The upstream
aks-engine releases features and provides configuration options ahead of support in AKS clusters. So, if you wish to use a container runtime other than
containerd or Docker, you can run
aks-engine to configure and deploy a Kubernetes cluster that meets your current needs.
AKS uses node resources to help the node function as part of your cluster. This usage can create a discrepancy between your node's total resources and the allocatable resources in AKS. Remember this information when setting requests and limits for user deployed pods.
To find a node's allocatable resources, run:
kubectl describe node [NODE_NAME]
To maintain node performance and functionality, AKS reserves resources on each node. As a node grows larger in resources, the resource reservation grows due to a higher need for management of user-deployed pods.
Using AKS add-ons such as Container Insights (OMS) will consume additional node resources.
Two types of resources are reserved:
Reserved CPU is dependent on node type and cluster configuration, which may cause less allocatable CPU due to running additional features.
CPU cores on host 1 2 4 8 16 32 64 Kube-reserved (millicores) 60 100 140 180 260 420 740
Memory utilized by AKS includes the sum of two values.
kubeletdaemon is installed on all Kubernetes agent nodes to manage container creation and termination.
By default on AKS,
kubeletdaemon has the memory.available<750Mi eviction rule, ensuring a node must always have at least 750 Mi allocatable at all times. When a host is below that available memory threshold, the
kubeletwill trigger to terminate one of the running pods and free up memory on the host machine.
A regressive rate of memory reservations for the kubelet daemon to properly function (kube-reserved).
- 25% of the first 4 GB of memory
- 20% of the next 4 GB of memory (up to 8 GB)
- 10% of the next 8 GB of memory (up to 16 GB)
- 6% of the next 112 GB of memory (up to 128 GB)
- 2% of any memory above 128 GB
Memory and CPU allocation rules:
- Keep agent nodes healthy, including some hosting system pods critical to cluster health.
- Cause the node to report less allocatable memory and CPU than it would if it were not part of a Kubernetes cluster.
The above resource reservations can't be changed.
For example, if a node offers 7 GB, it will report 34% of memory not allocatable including the 750Mi hard eviction threshold.
0.75 + (0.25*4) + (0.20*3) = 0.75GB + 1GB + 0.6GB = 2.35GB / 7GB = 33.57% reserved
In addition to reservations for Kubernetes itself, the underlying node OS also reserves an amount of CPU and memory resources to maintain OS functions.
For associated best practices, see Best practices for basic scheduler features in AKS.
Nodes of the same configuration are grouped together into node pools. A Kubernetes cluster contains at least one node pool. The initial number of nodes and size are defined when you create an AKS cluster, which creates a default node pool. This default node pool in AKS contains the underlying VMs that run your agent nodes.
To ensure your cluster operates reliably, you should run at least two (2) nodes in the default node pool.
You scale or upgrade an AKS cluster against the default node pool. You can choose to scale or upgrade a specific node pool. For upgrade operations, running containers are scheduled on other nodes in the node pool until all the nodes are successfully upgraded.
For more information about how to use multiple node pools in AKS, see Create and manage multiple node pools for a cluster in AKS.
In an AKS cluster with multiple node pools, you may need to tell the Kubernetes Scheduler which node pool to use for a given resource. For example, ingress controllers shouldn't run on Windows Server nodes.
Node selectors let you define various parameters, like node OS, to control where a pod should be scheduled.
The following basic example schedules an NGINX instance on a Linux node using the node selector "beta.kubernetes.io/os": linux:
kind: Pod apiVersion: v1 metadata: name: nginx spec: containers: - name: myfrontend image: mcr.microsoft.com/oss/nginx/nginx:1.15.12-alpine nodeSelector: "beta.kubernetes.io/os": linux
For more information on how to control where pods are scheduled, see Best practices for advanced scheduler features in AKS.
Kubernetes uses pods to run an instance of your application. A pod represents a single instance of your application.
Pods typically have a 1:1 mapping with a container. In advanced scenarios, a pod may contain multiple containers. Multi-container pods are scheduled together on the same node, and allow containers to share related resources.
When you create a pod, you can define resource requests to request a certain amount of CPU or memory resources. The Kubernetes Scheduler tries meet the request by scheduling the pods to run on a node with available resources. You can also specify maximum resource limits to prevent a pod from consuming too much compute resource from the underlying node. Best practice is to include resource limits for all pods to help the Kubernetes Scheduler identify necessary, permitted resources.
A pod is a logical resource, but application workloads run on the containers. Pods are typically ephemeral, disposable resources. Individually scheduled pods miss some of the high availability and redundancy Kubernetes features. Instead, pods are deployed and managed by Kubernetes Controllers, such as the Deployment Controller.
Deployments and YAML manifests
A deployment represents identical pods managed by the Kubernetes Deployment Controller. A deployment defines the number of pod replicas to create. The Kubernetes Scheduler ensures that additional pods are scheduled on healthy nodes if pods or nodes encounter problems.
You can update deployments to change the configuration of pods, container image used, or attached storage. The Deployment Controller:
- Drains and terminates a given number of replicas.
- Creates replicas from the new deployment definition.
- Continues the process until all replicas in the deployment are updated.
Most stateless applications in AKS should use the deployment model rather than scheduling individual pods. Kubernetes can monitor deployment health and status to ensure that the required number of replicas run within the cluster. When scheduled individually, pods aren't restarted if they encounter a problem, and aren't rescheduled on healthy nodes if their current node encounters a problem.
You don't want to disrupt management decisions with an update process if your application requires a minimum number of available instances. Pod Disruption Budgets define how many replicas in a deployment can be taken down during an update or node upgrade. For example, if you have five (5) replicas in your deployment, you can define a pod disruption of 4 (four) to only allow one replica to be deleted or rescheduled at a time. As with pod resource limits, best practice is to define pod disruption budgets on applications that require a minimum number of replicas to always be present.
Deployments are typically created and managed with
kubectl create or
kubectl apply. Create a deployment by defining a manifest file in the YAML format.
The following example creates a basic deployment of the NGINX web server. The deployment specifies three (3) replicas to be created, and requires port 80 to be open on the container. Resource requests and limits are also defined for CPU and memory.
apiVersion: apps/v1 kind: Deployment metadata: name: nginx spec: replicas: 3 selector: matchLabels: app: nginx template: metadata: labels: app: nginx spec: containers: - name: nginx image: mcr.microsoft.com/oss/nginx/nginx:1.15.2-alpine ports: - containerPort: 80 resources: requests: cpu: 250m memory: 64Mi limits: cpu: 500m memory: 256Mi
More complex applications can be created by including services (such as load balancers) within the YAML manifest.
For more information, see Kubernetes deployments.
Package management with Helm
Helm is commonly used to manage applications in Kubernetes. You can deploy resources by building and using existing public Helm charts that contain a packaged version of application code and Kubernetes YAML manifests. You can store Helm charts either locally or in a remote repository, such as an Azure Container Registry Helm chart repo.
To use Helm, install the Helm client on your computer, or use the Helm client in the Azure Cloud Shell. Search for or create Helm charts, and then install them to your Kubernetes cluster. For more information, see Install existing applications with Helm in AKS.
StatefulSets and DaemonSets
Using the Kubernetes Scheduler, the Deployment Controller runs replicas on any available node with available resources. While this approach may be sufficient for stateless applications, The Deployment Controller is not ideal for applications that require:
- A persistent naming convention or storage.
- A replica to exist on each select node within a cluster.
Two Kubernetes resources, however, let you manage these types of applications:
- StatefulSets maintain the state of applications beyond an individual pod lifecycle, such as storage.
- DaemonSets ensure a running instance on each node, early in the Kubernetes bootstrap process.
Modern application development often aims for stateless applications. For stateful applications, like those that include database components, you can use StatefulSets. Like deployments, a StatefulSet creates and manages at least one identical pod. Replicas in a StatefulSet follow a graceful, sequential approach to deployment, scale, upgrade, and termination. The naming convention, network names, and storage persist as replicas are rescheduled with a StatefulSet.
Define the application in YAML format using
kind: StatefulSet. From there, the StatefulSet Controller handles the deployment and management of the required replicas. Data is written to persistent storage, provided by Azure Managed Disks or Azure Files. With StatefulSets, the underlying persistent storage remains, even when the StatefulSet is deleted.
For more information, see Kubernetes StatefulSets.
Replicas in a StatefulSet are scheduled and run across any available node in an AKS cluster. To ensure at least one pod in your set runs on a node, you use a DaemonSet instead.
For specific log collection or monitoring, you may need to run a pod on all, or selected, nodes. You can use DaemonSet deploy one or more identical pods, but the DaemonSet Controller ensures that each node specified runs an instance of the pod.
The DaemonSet Controller can schedule pods on nodes early in the cluster boot process, before the default Kubernetes scheduler has started. This ability ensures that the pods in a DaemonSet are started before traditional pods in a Deployment or StatefulSet are scheduled.
Like StatefulSets, a DaemonSet is defined as part of a YAML definition using
For more information, see Kubernetes DaemonSets.
If using the Virtual Nodes add-on, DaemonSets will not create pods on the virtual node.
Kubernetes resources, such as pods and deployments, are logically grouped into a namespace to divide an AKS cluster and restrict create, view, or manage access to resources. For example, you can create namespaces to separate business groups. Users can only interact with resources within their assigned namespaces.
When you create an AKS cluster, the following namespaces are available:
|default||Where pods and deployments are created by default when none is provided. In smaller environments, you can deploy applications directly into the default namespace without creating additional logical separations. When you interact with the Kubernetes API, such as with
|kube-system||Where core resources exist, such as network features like DNS and proxy, or the Kubernetes dashboard. You typically don't deploy your own applications into this namespace.|
|kube-public||Typically not used, but can be used for resources to be visible across the whole cluster, and can be viewed by any user.|
For more information, see Kubernetes namespaces.
This article covers some of the core Kubernetes components and how they apply to AKS clusters. For more information on core Kubernetes and AKS concepts, see the following articles: