Azure Service Fabric Default Metrics Memory/CPU Balancing threshold not triggered

Deyvid Todorov 0 Reputation points
2024-01-30T11:00:58.3+00:00

Hello, We have a problem with the allocation of memory on the nodes. We have big differences between the memory that is allocated. We tried to add factory settings, to enable CPU/Memory metrics to start balancing and counted thresholds, but we do not see an event that indicates that re-balancing is triggered. Let me give you some context. I am using a silver tier with 5 seed nodes/primary and 3 more that are not seed nodes. I think it could be from a wrong configuration of the balancing threshold and activity threshold. Our nodes have 4 cores and 28 GB of memory. Here is our fabricSettings configuration

  {
                        "name": "PlacementAndLoadBalancing",
                        "parameters": [
                            {
                                "name":"AllowConstraintCheckFixesDuringApplicationUpgrade",
                                "value": "false"
                            },
                            {
                                "name": "AutoDetectAvailableResources",
                                "value": "false"
                            },
                            {
                                "name": "LoadBalancingEnabled",
                                "value": "true"                               
                            },                            
                            {
                                "name": "CpuPercentageNodeCapacity",
                                "value": "0.8"                                
                            },
                            {
                                "name": "MemoryPercentageNodeCapacity",
                                "value": "0.8"                               
                            },
                            {
                                "name": "PreventTransientOvercommit",
                                "value": "true"                               
                            }                                                        
                        ]
                    },
                    {
                        "name": "MetricBalancingThresholds",
                        "parameters": [
                            {
                                "name": "servicefabric:/_CpuCores",
                                "value": "2"
                            },
                            {
                                "name": "servicefabric:/_MemoryInMB",
                                "value": "1.5"
                            }
                        ]
                    },
                    {
                        "name": "MetricActivityThresholds",
                        "parameters": [
                            {
                                "name": "servicefabric:/_CpuCores",
                                "value": "4"
                            },
                            {
                                "name": "servicefabric:/_MemoryInMB",
                                "value": "22000"
                            }

Huge difference between node's Memory AvailableScreenshot_132

There is no event that indicates a rebalancing usually when we deploy the new version we see eventCode:7147 which means a balancer move of the services across the nodes.Screenshot_13 There is no value for Deviation Before After, staying like this, and i think that is not working properlyScreenshot_14

Tried different values ​​with no effect. I don't know what it could be or how to make it work.We have looked at the documentation in detail and we think that we have set the settings correctly or we are making a mistake somewhere

Documentation:
https://learn.microsoft.com/en-us/azure/service-fabric/service-fabric-cluster-resource-manager-metrics#custom-metrics https://learn.microsoft.com/en-us/azure/service-fabric/service-fabric-resource-governance https://learn.microsoft.com/en-us/azure/service-fabric/service-fabric-cluster-resource-manager-balancing

Azure Service Fabric
Azure Service Fabric
An Azure service that is used to develop microservices and orchestrate containers on Windows and Linux.
252 questions
Azure Virtual Machines
Azure Virtual Machines
An Azure service that is used to provision Windows and Linux virtual machines.
7,152 questions
Azure Virtual Machine Scale Sets
Azure Virtual Machine Scale Sets
Azure compute resources that are used to create and manage groups of heterogeneous load-balanced virtual machines.
347 questions
{count} votes