Monitor performance of your cluster
Once you enable Container insights on a cluster, you can monitor the performance and health status of the cluster components and their workloads. Start with a summary view of all of your monitored clusters and then drill into the details of a particular cluster using built-in workbooks.
The Monitored clusters tab presents the following information for each of your monitored clusters:
- Custer status summary, showing a count of clusters for each status
- Whether all of the AKS deployments are healthy
- How many nodes and user and system pods are deployed per cluster
- How much disk space is available and if there's a capacity issue
The health statuses included are:
- Healthy: No issues are detected for the VM, and it's functioning as required.
- Critical: One or more critical issues are detected that must be addressed to restore normal operational state as expected.
- Warning: One or more issues are detected that must be addressed or the health condition could become critical.
- Unknown: If the service wasn't able to make a connection with the node or pod, the status changes to an Unknown state.
- Not found: Either the workspace, the resource group, or subscription that contains the workspace for this solution was deleted.
- Unauthorized: User doesn't have required permissions to read the data in the workspace.
- Error: An error occurred while attempting to read data from the workspace.
- Misconfigured: Container insights wasn't configured correctly in the specified workspace.
- No data: Data hasn't reported to the workspace for the last 30 minutes.
Health state calculates overall cluster status as the worst of the three states with one exception. If any of the three states is Unknown, the overall cluster state shows Unknown.
The following table provides a breakdown of the calculation that controls the health states for a monitored cluster on the multi-cluster view.
Monitored cluster | Status | Availability |
---|---|---|
User pod | Healthy Warning Critical Unknown |
100% 90 - 99% <90% If not reported in last 30 minutes |
System pod | Healthy Warning Critical Unknown |
100% N/A <100% If not reported in last 30 minutes |
Node | Healthy Warning Critical Unknown |
>85% 60 - 84% <60% If not reported in last 30 minutes If not reported in last 30 minutes |