Azure AKS Nodes Crashing

Muhammad Usman Khawar 0 Reputation points
2023-03-29T07:08:50.35+00:00

Hello,

I have a virtual machine scale set configured for AKS cluster, . Nodes are getting into not ready state, I flush nodes and get them in ready state but they go into non-ready state .

AKS version is latest 1.26.0

NAME                             STATUS     ROLES   AGE    VERSION

aks-system-60840726-vmss000006   Ready      agent   118m   v1.26.0

aks-system-60840726-vmss000007   NotReady   agent   81m    v1.26.0

aks-system-60840726-vmss000008   NotReady   agent   48m    v1.26.0

When I run kubectl describe nodes, I get the following:

FOR READY NODE

Events:
  Type     Reason                   Age                From                                                          Message
  ----     ------                   ----               ----                                                          -------
  Normal   NodeHasSufficientPID     50m (x3 over 50m)  kubelet                                                       Node aks-system-60840726-vmss000006 status is now: NodeHasSufficientPID
  Normal   Starting                 50m                kubelet                                                       Starting kubelet.
  Warning  InvalidDiskCapacity      50m                kubelet                                                       invalid capacity 0 on image filesystem
  Normal   NodeHasSufficientMemory  50m (x3 over 50m)  kubelet                                                       Node aks-system-60840726-vmss000006 status is now: NodeHasSufficientMemory
  Normal   NodeHasNoDiskPressure    50m (x3 over 50m)  kubelet                                                       Node aks-system-60840726-vmss000006 status is now: NodeHasNoDiskPressure
  Normal   NodeAllocatableEnforced  50m                kubelet                                                       Updated Node Allocatable limit across pods
  Warning  Rebooted                 50m                kubelet                                                       Node aks-system-60840726-vmss000006 has been rebooted, boot id: 17c1ec01-17ec-40d5-bdae-9fc9dcc3fb08
  Normal   NodeReady                50m                kubelet                                                       Node aks-system-60840726-vmss000006 status is now: NodeReady
  Normal   VMEventScheduled         50m                custom-scheduledevents-consolidated-condition-plugin-monitor  Node condition VMEventScheduled is now: True, reason: VMEventScheduled
  Warning  RebootScheduled          50m                custom-scheduledevents-consolidated-plugin-monitor            Started :
  Normal   NoVMEventScheduled       50m                custom-scheduledevents-consolidated-condition-plugin-monitor  Node condition VMEventScheduled is now: False, reason: NoVMEventScheduled
  Normal   RegisteredNode           43m                node-controller                                               Node aks-system-60840726-vmss000006 event: Registered Node aks-system-60840726-vmss000006 in Controller
  Normal   RegisteredNode           12m                node-controller                                               Node aks-system-60840726-vmss000006 event: Registered Node aks-system-60840726-vmss000006 in Controller

FOR NON READY NODE

Events:
  Type     Reason                   Age                From                                                        Message
  ----     ------                   ----               ----                                                        -------
  Normal   NodeHasNoDiskPressure    49m (x2 over 49m)  kubelet                                                     Node aks-system-60840726-vmss000008 status is now: NodeHasNoDiskPressure
  Normal   Starting                 49m                kubelet                                                     Starting kubelet.
  Warning  InvalidDiskCapacity      49m                kubelet                                                     invalid capacity 0 on image filesystem
  Normal   NodeHasSufficientMemory  49m (x2 over 49m)  kubelet                                                     Node aks-system-60840726-vmss000008 status is now: NodeHasSufficientMemory
  Normal   NodeHasSufficientPID     49m (x2 over 49m)  kubelet                                                     Node aks-system-60840726-vmss000008 status is now: NodeHasSufficientPID
  Normal   RegisteredNode           49m                node-controller                                             Node aks-system-60840726-vmss000008 event: Registered Node aks-system-60840726-vmss000008 in Controller
  Normal   CreatedNNC               49m                dnc-rc/node-reconciler                                      Created NodeNetworkConfig aks-system-60840726-vmss000008
  Normal   NodeAllocatableEnforced  49m                kubelet                                                     Updated Node Allocatable limit across pods
  Normal   NodeReady                49m                kubelet                                                     Node aks-system-60840726-vmss000008 status is now: NodeReady
  Warning  ContainerdStart          49m                systemd-monitor                                             Starting containerd container runtime...
  Warning  PreemptScheduled         33m                custom-scheduledevents-consolidated-preempt-plugin-monitor  IMDS query failed, exit code: 28
Connection timed out after 24 seconds. Contact IMDSCoreDevsSG@microsoft.com
  Normal  NodeNotReady    32m  node-controller  Node aks-system-60840726-vmss000008 status is now: NodeNotReady
  Normal  RegisteredNode  20m  node-controller  Node aks-system-60840726-vmss000008 event: Registered Node aks-system-60840726-vmss000008 in Controller
Azure Kubernetes Service (AKS)
Azure Kubernetes Service (AKS)
An Azure service that provides serverless Kubernetes, an integrated continuous integration and continuous delivery experience, and enterprise-grade security and governance.
1,883 questions
Azure Virtual Machine Scale Sets
Azure Virtual Machine Scale Sets
Azure compute resources that are used to create and manage groups of heterogeneous load-balanced virtual machines.
353 questions
{count} votes

1 answer

Sort by: Most helpful
  1. vipullag-MSFT 24,711 Reputation points
    2023-03-29T08:16:16.1833333+00:00

    Hello Muhammad Usman Khawar

    Welcome to Microsoft Q&A Platform, thanks for posting your query here.

    The nodes in your AKS cluster are crashing and going into a non-ready state. You mentioned that you have a virtual machine scale set configured for your AKS cluster.

    When you run kubectl describe nodes, you see that the nodes are not ready and you get the following warning message: InvalidDiskCapacity. This could be due to the fact that the capacity of the image filesystem is invalid.

    You can try to resolve this issue by deleting the nodes that are in a failed state or otherwise remove from the cluster prior to upgrading. You can also try to use Scale-down Mode in Azure Kubernetes Service (AKS).

    Hope this helps.

    0 comments No comments