Node auto-repair
Applies to: AKS on Azure Stack HCI 22H2, AKS on Windows Server
To help minimize service disruptions for clusters, AKS enabled by Azure Arc continuously monitors the health state of worker nodes, and performs automatic node repair if issues arise or if they become unhealthy. This article describes how AKS Arc checks for unhealthy nodes and automatically repairs both Windows and Linux nodes. The article also shows how to manually check node health.
How AKS checks for unhealthy nodes
AKS Arc uses the following rules to determine if a node is unhealthy and needs repair:
- The node reports a NotReady status on consecutive checks.
- The node doesn't report any status within 20-30 minutes.
You can manually check the health state of your nodes with kubectl
, as follows:
kubectl get nodes
The status of the nodes should look similar to the following output:
NAME STATUS ROLES AGE VERSION
moc-l2tlqojhk2d Ready master 46h v1.19.7
moc-l8h8i6lxk1h Ready <none> 46h v1.19.7
moc-lqnjufwo2cy Ready master 46h v1.19.7
moc-ltyl8mqy47z Ready <none> 47h v1.19.7
moc-lwn5xnrapnj Ready master 47h v1.19.7
moc-wvt025q406z Ready <none> 47h v1.19.7
How automatic repair works
If AKS Arc identifies an unhealthy node that remains unhealthy for more than 20-30 minutes, AKS creates and reimages a new node.
It usually takes 20 to 30 minutes to repair the node. If AKS Arc finds multiple unhealthy nodes during a health check, each node is repaired individually before another repair begins.
Next steps
Feedback
https://aka.ms/ContentUserFeedback.
Coming soon: Throughout 2024 we will be phasing out GitHub Issues as the feedback mechanism for content and replacing it with a new feedback system. For more information see:Submit and view feedback for