Autoscale a managed online endpoint (preview)
Autoscale automatically runs the right amount of resources to handle the load on your application. Managed endpoints supports autoscaling through integration with the Azure Monitor autoscale feature.
Azure Monitor autoscaling supports a rich set of rules. You can configure metrics-based scaling (for instance, CPU utilization >70%), schedule-based scaling (for example, scaling rules for peak business hours), or a combination. For more information, see Overview of autoscale in Microsoft Azure.
Today, you can manage autoscaling using either the Azure CLI, REST, ARM, or the browser-based Azure portal. Other Azure ML SDKs, such as the Python SDK, will add support over time.
- A deployed endpoint. Deploy and score a machine learning model by using a managed online endpoint (preview).
Define an autoscale profile
To enable autoscale for an endpoint, you first define an autoscale profile. This profile defines the default, minimum, and maximum scale set capacity. The following example sets the default and minimum capacity as two VM instances, and the maximum capacity as five:
The following snippet sets the endpoint and deployment names:
# set your existing endpoint name ENDPOINT_NAME=your-endpoint-name DEPLOYMENT_NAME=blue
Next, get the Azure Resource Manager ID of the deployment and endpoint:
# ARM id of the deployment DEPLOYMENT_RESOURCE_ID=$(az ml online-deployment show -e $ENDPOINT_NAME -n $DEPLOYMENT_NAME -o tsv --query "id") # ARM id of the deployment. todo: change to --query "id" ENDPOINT_RESOURCE_ID=$(az ml online-endpoint show -n $ENDPOINT_NAME -o tsv --query "properties.\"azureml.onlineendpointid\"") # set a unique name for autoscale settings for this deployment. The below will append a random number to make the name unique. AUTOSCALE_SETTINGS_NAME=autoscale-`echo $RANDOM`
The following snippet creates the autoscale profile:
az monitor autoscale create \ --name $AUTOSCALE_SETTINGS_NAME \ --resource $DEPLOYMENT_RESOURCE_ID \ --min-count 2 --max-count 5 --count 2
For more, see the reference page for autoscale
Create a rule to scale out using metrics
A common scaling out rule is one that increases the number of VM instances when the average CPU load is high. The following example will allocate two more nodes (up to the maximum) if the CPU average a load of greater than 70% for five minutes::
az monitor autoscale rule create \ --autoscale-name $AUTOSCALE_SETTINGS_NAME \ --condition "CpuUtilizationPercentage > 70 avg 5m" \ --scale out 2
The rule is part of the
my-scale-settings profile (
autoscale-name matches the
name of the profile). The value of its
condition argument says the rule should trigger when "The average CPU consumption among the VM instances exceeds 70% for five minutes." When that condition is satisfied, two more VM instances are allocated.
For more information on the CLI syntax, see
az monitor autoscale.
Create a rule to scale in using metrics
When load is light, a scaling in rule can reduce the number of VM instances. The following example will release a single node, down to a minimum of 2, if the CPU load is less than 30% for 5 minutes:
az monitor autoscale rule create \ --autoscale-name $AUTOSCALE_SETTINGS_NAME \ --condition "CpuUtilizationPercentage < 25 avg 5m" \ --scale in 1
Create a scaling rule based on endpoint metrics
The previous rules applied to the deployment. Now, add a rule that applies to the endpoint. In this example, if the request latency is greater than an average of 70 milliseconds for 5 minutes, allocate another node.
az monitor autoscale rule create \ --autoscale-name $AUTOSCALE_SETTINGS_NAME \ --condition "RequestLatency > 70 avg 5m" \ --scale out 1 \ --resource $ENDPOINT_RESOURCE_ID
Create scaling rules based on a schedule
You can also create rules that apply only on certain days or at certain times. In this example, the node count is set to 2 on the weekend.
az monitor autoscale profile create \ --name weekend-profile \ --autoscale-name $AUTOSCALE_SETTINGS_NAME \ --min-count 2 --count 2 --max-count 2 \ --recurrence week sat sun --timezone "Pacific Standard Time"
If you are not going to use your deployments, delete them:
az ml online-endpoint delete --name $ENDPOINT_NAME --yes --no-wait
To learn more about autoscale with Azure Monitor, see the following articles: