Best practices for cluster isolation in Azure Kubernetes Service (AKS)
As you manage clusters in Azure Kubernetes Service (AKS), you often need to isolate teams and workloads. AKS provides flexibility in how you can run multi-tenant clusters and isolate resources. To maximize your investment in Kubernetes, first understand and implement AKS multi-tenancy and isolation features.
This best practices article focuses on isolation for cluster operators. In this article, you learn how to:
- Plan for multi-tenant clusters and separation of resources
- Use logical or physical isolation in your AKS clusters
Design clusters for multi-tenancy
Kubernetes lets you logically isolate teams and workloads in the same cluster. The goal is to provide the least number of privileges, scoped to the resources each team needs. A Kubernetes Namespace creates a logical isolation boundary. Additional Kubernetes features and considerations for isolation and multi-tenancy include the following areas:
Scheduling uses basic features such as resource quotas and pod disruption budgets. For more information about these features, see Best practices for basic scheduler features in AKS.
More advanced scheduler features include:
- Taints and tolerations
- Node selectors
- Node and pod affinity or anti-affinity.
For more information about these features, see Best practices for advanced scheduler features in AKS.
Networking uses network policies to control the flow of traffic in and out of pods.
Authentication and authorization
Authentication and authorization uses:
- Role-based access control (RBAC)
- Azure Active Directory (AD) integration
- Pod identities
- Secrets in Azure Key Vault
For more information about these features, see Best practices for authentication and authorization in AKS.
- The Azure Policy Add-on for AKS to enforce pod security.
- The use of pod security contexts.
- Scanning both images and the runtime for vulnerabilities.
- Using App Armor or Seccomp (Secure Computing) to restrict container access to the underlying node.
Logically isolate clusters
Best practice guidance
Separate teams and projects using logical isolation. Minimize the number of physical AKS clusters you deploy to isolate teams or applications.
With logical isolation, a single AKS cluster can be used for multiple workloads, teams, or environments. Kubernetes Namespaces form the logical isolation boundary for workloads and resources.
Logical separation of clusters usually provides a higher pod density than physically isolated clusters, with less excess compute capacity sitting idle in the cluster. When combined with the Kubernetes cluster autoscaler, you can scale the number of nodes up or down to meet demands. This best practice approach to autoscaling minimizes costs by running only the number of nodes required.
Currently, Kubernetes environments aren't completely safe for hostile multi-tenant usage. In a multi-tenant environment, multiple tenants are working on a common, shared infrastructure. If all tenants cannot be trusted, you will need extra planning to prevent tenants from impacting the security and service of others.
Additional security features, like Kubernetes RBAC for nodes, efficiently block exploits. For true security when running hostile multi-tenant workloads, you should only trust a hypervisor. The security domain for Kubernetes becomes the entire cluster, not an individual node.
For these types of hostile multi-tenant workloads, you should use physically isolated clusters.
Physically isolate clusters
Best practice guidance
Minimize the use of physical isolation for each separate team or application deployment. Instead, use logical isolation, as discussed in the previous section.
Physically separating AKS clusters is a common approach to cluster isolation. In this isolation model, teams or workloads are assigned their own AKS cluster. While physical isolation might look like the easiest way to isolate workloads or teams, it adds management and financial overhead. Now, you must maintain these multiple clusters and individually provide access and assign permissions. You'll also be billed for each the individual node.
Physically separate clusters usually have a low pod density. Since each team or workload has their own AKS cluster, the cluster is often over-provisioned with compute resources. Often, a small number of pods are scheduled on those nodes. Unclaimed node capacity can't be used for applications or services in development by other teams. These excess resources contribute to the additional costs in physically separate clusters.
This article focused on cluster isolation. For more information about cluster operations in AKS, see the following best practices: