Azure Kubernetes Services (AKS) 中基本排程器功能的最佳做法Best practices for basic scheduler features in Azure Kubernetes Service (AKS)

您在管理 Azure Kubernetes Service (AKS) 中的叢集時,往往需要隔離小組和工作負載。As you manage clusters in Azure Kubernetes Service (AKS), you often need to isolate teams and workloads. Kubernetes 排程器提供了一些功能,以讓您控制計算資源的分配,或限制維護事件的影響。The Kubernetes scheduler provides features that let you control the distribution of compute resources, or limit the impact of maintenance events.

本最佳做法文章著重於叢集操作員的基本 Kubernetes 排程功能。This best practices article focuses on basic Kubernetes scheduling features for cluster operators. 在本文中,您將了解:In this article, you learn how to:

  • 使用資源配額提供固定的資源數量給小組或工作負載Use resource quotas to provide a fixed amount of resources to teams or workloads
  • 使用 Pod 中斷預算限制排程維護的影響Limit the impact of scheduled maintenance using pod disruption budgets
  • 使用 kube-advisor 工具檢查是否有遺漏的 Pod 資源要求和限制Check for missing pod resource requests and limits using the kube-advisor tool

強制執行資源配額Enforce resource quotas

最佳做法指引 - 請在命名空間層級規劃和套用資源配額。Best practice guidance - Plan and apply resource quotas at the namespace level. 如果 Pod 未定義資源要求和限制,則拒絕該部署。If pods don't define resource requests and limits, reject the deployment. 監視資源使用量,並視需要調整配額。Monitor resource usage and adjust quotas as needed.

資源要求和限制會放置在 Pod 規格中。Resource requests and limits are placed in the pod specification. Kubernetes 排程器會在部署期間使用這些限制來尋找叢集中可用的節點。These limits are used by the Kubernetes scheduler at deployment time to find an available node in the cluster. 這些限制和要求可在個別 Pod 層級中運作。These limits and requests work at the individual pod level. 如需如何定義這些值的詳細資訊,請參閱定義 pod 資源要求和限制For more information about how to define these values, see Define pod resource requests and limits

若要提供一個方式來保留及限制跨開發小組或專案的資源,請使用「資源配額」 。To provide a way to reserve and limit resources across a development team or project, you should use resource quotas. 這些配額會定義在命名空間上,且可用來對下列基礎設定配額:These quotas are defined on a namespace, and can be used to set quotas on the following basis:

  • 計算資源,例如 CPU 和記憶體或 GPU。Compute resources, such as CPU and memory, or GPUs.
  • 儲存體資源,包括磁碟區的總數,或是指定儲存體類別的磁碟空間數量。Storage resources, includes the total number of volumes or amount of disk space for a given storage class.
  • 物件計數,例如可建立的祕密、服務或作業數目上限。Object count, such as maximum number of secrets, services, or jobs can be created.

Kubernetes 不會過量使用資源。Kubernetes doesn't overcommit resources. 一旦資源要求或限制的累計總和超過指派的配額後,任何進一步的部署都不會成功。Once the cumulative total of resource requests or limits passes the assigned quota, no further deployments are successful.

當您定義資源配額時,命名空間中建立的 Pod 都必須在其 Pod 規格中提供限制或要求。When you define resource quotas, all pods created in the namespace must provide limits or requests in their pod specifications. 如果未提供這些值,您可以拒絕部署。If they don't provide these values, you can reject the deployment. 相反地,您可以設定預設要求和命名空間的限制Instead, you can configure default requests and limits for a namespace.

下列名為 dev-app-team-quotas.yaml 的範例 YAML 資訊清單會設定總共只能有 10 個 CPU、20 Gi 的記憶體和 10 個 Pod 的固定限制:The following example YAML manifest named dev-app-team-quotas.yaml sets a hard limit of a total of 10 CPUs, 20Gi of memory, and 10 pods:

apiVersion: v1
kind: ResourceQuota
metadata:
  name: dev-app-team
spec:
  hard:
    cpu: "10"
    memory: 20Gi
    pods: "10"

您可以藉由指定命名空間來套用這個資源配額,例如 dev-apps :This resource quota can be applied by specifying the namespace, such as dev-apps:

kubectl apply -f dev-app-team-quotas.yaml --namespace dev-apps

請與您的應用程式開發人員和擁有者合作,了解他們的需求並套用適當的資源配額。Work with your application developers and owners to understand their needs and apply the appropriate resource quotas.

如需可用的資源物件、 範圍和優先順序的詳細資訊,請參閱在 Kubernetes 中的資源配額For more information about available resource objects, scopes, and priorities, see Resource quotas in Kubernetes.

使用 Pod 中斷預算規劃可用性Plan for availability using pod disruption budgets

最佳做法指引 - 為維護應用程式的可用性,請定義 Pod 中斷預算 (PDB) 以確定叢集中可用的 Pod 數目下限。Best practice guidance - To maintain the availability of applications, define Pod Disruption Budgets (PDBs) to make sure that a minimum number of pods are available in the cluster.

有兩個干擾性事件會導致 Pod 遭到移除:There are two disruptive events that cause pods to be removed:

  • 「非自發性中斷」 是超過叢集操作員或應用程式擁有者一般控制力的事件。Involuntary disruptions are events beyond the typical control of the cluster operator or application owner.
    • 這些非自願中斷包含實體機器上的硬體故障、核心異常或節點 VM 遭到刪除These involuntary disruptions include a hardware failure on the physical machine, a kernel panic, or the deletion of a node VM
  • 「自發性中斷」 是叢集操作員或應用程式擁有者所要求的事件。Voluntary disruptions are events requested by the cluster operator or application owner.
    • 這些自發性中斷包括叢集升級、部署範本更新,或不小心刪除 Pod。These voluntary disruptions include cluster upgrades, an updated deployment template, or accidentally deleting a pod.

您可以在部署中使用您 Pod 的多個複本來降低非自發性中斷。The involuntary disruptions can be mitigated by using multiple replicas of your pods in a deployment. 在 AKS 叢集中執行多個節點也有助於避免這些非自發性中斷的發生。Running multiple nodes in the AKS cluster also helps with these involuntary disruptions. 針對自發性中斷,Kubernetes 會提供「Pod 中斷預算」 ,以讓叢集操作員定義可用資源計數下限或無法使用的資源計數上限。For voluntary disruptions, Kubernetes provides pod disruption budgets that let the cluster operator define a minimum available or maximum unavailable resource count. 這些 Pod 中斷預算可讓您規劃當發生自發性中斷事件時,部署或複本集要如何回應。These pod disruption budgets let you plan for how deployments or replica sets respond when a voluntary disruption event occurs.

如果要升級叢集或更新部署範本,Kubernetes 排程器會先確定其他節點上已排程另外的 Pod,才讓自發性中斷事件繼續進行。If a cluster is to be upgraded or a deployment template updated, the Kubernetes scheduler makes sure additional pods are scheduled on other nodes before the voluntary disruption events can continue. 排程器在等到叢集的其他節點上已成功排程所定義數量的 Pod,才會將節點重新開機。The scheduler waits before a node is rebooted until the defined number of pods are successfully scheduled on other nodes in the cluster.

讓我們看看一個複本集範例,此複本集具有五個執行 NGINX 的 Pod。Let's look at an example of a replica set with five pods that run NGINX. 複本集內的 Pod 已獲派 app: nginx-frontend 標籤。The pods in the replica set as assigned the label app: nginx-frontend. 在自發性中斷事件 (例如,叢集升級) 發生期間,您想要確定至少有三個 Pod 會繼續執行。During a voluntary disruption event, such as a cluster upgrade, you want to make sure at least three pods continue to run. PodDisruptionBudget 物件的下列 YAML 資訊清單會定義這些需求:The following YAML manifest for a PodDisruptionBudget object defines these requirements:

apiVersion: policy/v1beta1
kind: PodDisruptionBudget
metadata:
   name: nginx-pdb
spec:
   minAvailable: 3
   selector:
   matchLabels:
      app: nginx-frontend

您也可以定義百分比 (例如,60% ),以便能夠自動補償相應增加 Pod 數目的複本集。You can also define a percentage, such as 60%, which allows you to automatically compensate for the replica set scaling up the number of pods.

您可以在複本集內定義無法使用的執行個體數目上限。You can define a maximum number of unavailable instances in a replica set. 同樣地,也可以定義無法使用的 Pod 上限百分比。Again, a percentage for the maximum unavailable pods can also be defined. 下列 Pod 中斷預算 YAML 資訊清單會定義複本集內不能有超過兩個 Pod 無法使用:The following pod disruption budget YAML manifest defines that no more than two pods in the replica set be unavailable:

apiVersion: policy/v1beta1
kind: PodDisruptionBudget
metadata:
   name: nginx-pdb
spec:
   maxUnavailable: 2
   selector:
   matchLabels:
      app: nginx-frontend

在定義了 Pod 中斷預算之後,請將其建立到 AKS 叢集內,就如同您對任何其他 Kubernetes 物件的做法:Once your pod disruption budget is defined, you create it in your AKS cluster as with any other Kubernetes object:

kubectl apply -f nginx-pdb.yaml

請與您的應用程式開發人員和擁有者合作,了解他們的需求並套用適當的 Pod 中斷預算。Work with your application developers and owners to understand their needs and apply the appropriate pod disruption budgets.

如需使用 pod 中斷預算的詳細資訊,請參閱指定您的應用程式中斷預算For more information about using pod disruption budgets, see Specify a disruption budget for your application.

使用 kube-advisor 定期檢查叢集的問題Regularly check for cluster issues with kube-advisor

最佳作法指引-定期執行的最新版本kube-advisor開放原始碼工具,來偵測您的叢集中的問題。Best practice guidance - Regularly run the latest version of kube-advisor open source tool to detect issues in your cluster. 如果您在現有的 AKS 叢集上套用資源配額,請先執行 kube-advisor 以尋找未定義資源要求和限制的 Pod。If you apply resource quotas on an existing AKS cluster, run kube-advisor first to find pods that don't have resource requests and limits defined.

Kube advisor工具是相關聯的 AKS 開放原始碼專案,掃描的 Kubernetes 叢集,並報告它找到的問題。The kube-advisor tool is an associated AKS open source project that scans a Kubernetes cluster and reports on issues that it finds. 一個實用的檢查,就是找出沒有備妥資源要求和限制的 Pod。One useful check is to identify pods that don't have resource requests and limits in place.

Kube advisor 工具可報告資源的要求和限制遺漏 PodSpecs for Windows 應用程式,以及 Linux 應用程式,但 kube advisor 工具本身必須經過排程上的 Linux pod。The kube-advisor tool can report on resource request and limits missing in PodSpecs for Windows applications as well as Linux applications, but the kube-advisor tool itself must be scheduled on a Linux pod. 您可以排程在特定的作業系統使用的節點集區上執行的 pod節點選取器pod 的組態中。You can schedule a pod to run on a node pool with a specific OS using a node selector in the pod's configuration.

在裝載多個開發小組和應用程式的 AKS 叢集中,若沒有這些資源要求和限制集,就可能難以追蹤 Pod。In an AKS cluster that hosts multiple development teams and applications, it can be hard to track pods without these resource requests and limits set. 最佳做法是在您的 AKS 叢集上定期執行 kube-advisor,特別是如果您未對命名空間指派資源配額時。As a best practice, regularly run kube-advisor on your AKS clusters, especially if you don't assign resource quotas to namespaces.

後續步驟Next steps

本文著重於 Kubernetes 排程器的基本功能。This article focused on basic Kubernetes scheduler features. 如需 AKS 中叢集作業的相關詳細資訊,請參閱下列最佳作法:For more information about cluster operations in AKS, see the following best practices: