question

COLASANTOFRANCESCA-6756 avatar image
0 Votes"
COLASANTOFRANCESCA-6756 asked prmanhas-MSFT commented

How to save costs when a Dev/Test (non-prod) AKS cluster is not in use

When our Dev/Test (non-prod) AKS clusters are not in use
we like to shut them down but it was verified that


  • “az vmss deallocate” is causing missing Monitoring compliance alert.


  • “az aks stop” is causing PLE AKS removing in Regional Bastion and the cluster state of a stopped AKS cluster is preserved for up to 12 months, if a cluster is stopped for more than 12 months, the cluster state cannot be recovered (see https://docs.microsoft.com/en-us/azure/aks/start-stop-cluster)


So the best way to reduce costs associated with an AKS cluster (when it isn't being used) looks like the node count manual scale.

For each our NPRD AKS Cluster has been defined just a system node pool (e.g. agentpool), it means that "Every AKS cluster must contain at least one system node pool with at least one node." (according to that in https://docs.microsoft.com/en-us/azure/aks/use-system-pools).
So we can't scale the system node pool to 0, this looks like allowed just if a user node pool is defined along with a system node pool as the user node pool can be scaled to 0, but the system node pool can't, to verify possible side-effects.


Please, could you confirm that reported above in terms of expected behavior ?

Are there any further chances to hang our AKS clusters in a sort of Stand-by state to save costs when not in use without risks ?

Any hints will be appreciated.

Thanks !





azure-kubernetes-service
· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

@COLASANTOFRANCESCA-6756 Any update on the issue?

If you need any further help on the issue do let us know.

Thanks

0 Votes 0 ·

1 Answer

Sam-Cogan avatar image
0 Votes"
Sam-Cogan answered COLASANTOFRANCESCA-6756 commented

First up, you should not use any of the VMSS commands against your AKS nodes, this will break things, so don't do that.
If your using a private AKS cluster then using the start/stop cluster functionality will cause the private link endpoints to need to be recreated. You could automate this, but not ideal.

So, given that, the only other option to save cost is scaling the node pools down as low as they can go. As you mention, the system pool needs to have at least one node so you can't go down to 0. You can avoid doing this manually if you use the cluster node autoscaler, this can scale down for you so long as there is not workload on the cluster requiring more nodes.

· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Thanks

So there are two options to save costs when aks cluster is not in use 1) stopping AKS cluster (az aks stop) and recreate PLE when needed or 2) manual/auto scaling node pool down (e.g. count=1).

0 Votes 0 ·