你当前正在访问 Microsoft Azure Global Edition 技术文档网站。如果需要访问由世纪互联运营的 Microsoft Azure 中国技术文档网站，请访问 https://docs.azure.cn。

自动缩放联机终结点

项目
04/04/2023

适用范围：Azure CLI ml 扩展 v2（最新版）Python SDK azure-ai-ml v2（最新版）

自动缩放会自动运行适量的资源来处理应用程序的负载。联机终结点支持通过与 Azure Monitor 自动缩放功能的集成进行自动缩放。

Azure Monitor 自动缩放支持一组丰富的规则。可以配置基于指标的缩放（例如，CPU 利用率 >70%）、基于计划的缩放（例如，针对业务高峰期的缩放规则）或两者的组合。有关详细信息，请参阅 Microsoft Azure 中的自动缩放概述。

通过按需添加/删除实例进行自动缩放的示意图

目前，可以使用 Azure CLI、REST、ARM 或基于浏览器的 Azure 门户来管理自动缩放。今后也会添加对其他 Azure 机器学习 SDK（例如 Python SDK）的支持。

先决条件

一个已部署的终结点。使用联机终结点部署机器学习模型并对其进行评分。
若要使用自动缩放，必须将角色 microsoft.insights/autoscalesettings/write 分配给管理自动缩放的标识。你可以使用允许此操作的任何内置或自定义角色。有关管理 Azure 机器学习角色的一般指南，请参阅管理用户和角色。有关 Azure Monitor 自动缩放设置的详细信息，请参阅 Microsoft.Insights autoscalesettings。

定义自动缩放配置文件

若要为终结点启用自动缩放，首先要定义自动缩放配置文件。此配置文件定义默认、最小和最大规模集容量。以下示例将默认容量和最小容量设置为两个 VM 实例，将最大容量设置为五个 VM 实例：

适用于：Azure CLI ml 扩展 v2（当前版）

以下代码片段设置终结点和部署名称：

# set your existing endpoint name
ENDPOINT_NAME=your-endpoint-name
DEPLOYMENT_NAME=blue

接下来，获取部署和终结点的 Azure 资源管理器 ID：

# ARM id of the deployment
DEPLOYMENT_RESOURCE_ID=$(az ml online-deployment show -e $ENDPOINT_NAME -n $DEPLOYMENT_NAME -o tsv --query "id")
# ARM id of the deployment. todo: change to --query "id"
ENDPOINT_RESOURCE_ID=$(az ml online-endpoint show -n $ENDPOINT_NAME -o tsv --query "properties.\"azureml.onlineendpointid\"")
# set a unique name for autoscale settings for this deployment. The below will append a random number to make the name unique.
AUTOSCALE_SETTINGS_NAME=autoscale-$ENDPOINT_NAME-$DEPLOYMENT_NAME-`echo $RANDOM`

以下代码片段创建自动缩放配置文件：

az monitor autoscale create \
  --name $AUTOSCALE_SETTINGS_NAME \
  --resource $DEPLOYMENT_RESOURCE_ID \
  --min-count 2 --max-count 5 --count 2

注意

有关详细信息，请参阅自动缩放参考页

适用于：Python SDK azure-ai-ml v2（当前版本）

导入模块：

from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential
from azure.mgmt.monitor import MonitorManagementClient
from azure.mgmt.monitor.models import AutoscaleProfile, ScaleRule, MetricTrigger, ScaleAction, Recurrence, RecurrentSchedule
import random 
import datetime

为工作区、终结点和部署定义变量：

subscription_id = "<YOUR-SUBSCRIPTION-ID>"
resource_group = "<YOUR-RESOURCE-GROUP>"
workspace = "<YOUR-WORKSPACE>"

endpoint_name = "<YOUR-ENDPOINT-NAME>"
deployment_name = "blue"

获取 Azure 机器学习和 Azure Monitor 客户端：

credential = DefaultAzureCredential()
ml_client = MLClient(
    credential, subscription_id, resource_group, workspace
)

mon_client = MonitorManagementClient(
    credential, subscription_id
)

获取终结点和部署对象：

deployment = ml_client.online_deployments.get(
    deployment_name, endpoint_name
)

endpoint = ml_client.online_endpoints.get(
    endpoint_name
)

创建自动缩放配置文件：

# Set a unique name for autoscale settings for this deployment. The below will append a random number to make the name unique.
autoscale_settings_name = f"autoscale-{endpoint_name}-{deployment_name}-{random.randint(0,1000)}"

mon_client.autoscale_settings.create_or_update(
    resource_group, 
    autoscale_settings_name, 
    parameters = {
        "location" : endpoint.location,
        "target_resource_uri" : deployment.id,
        "profiles" : [
            AutoscaleProfile(
                name="my-scale-settings",
                capacity={
                    "minimum" : 2, 
                    "maximum" : 5,
                    "default" : 2
                },
                rules = []
            )
        ]
    }
)

使用指标创建横向扩展规则

常用的横向扩展规则是在平均 CPU 负载较高时增加 VM 实例数目。如果 CPU 平均负载持续 5 分钟大于 70%，则以下示例将再分配两个节点（不超过最大数量）：

适用于：Azure CLI ml 扩展 v2（当前版）

az monitor autoscale rule create \
  --autoscale-name $AUTOSCALE_SETTINGS_NAME \
  --condition "CpuUtilizationPercentage > 70 avg 5m" \
  --scale out 2

该规则是 my-scale-settings 配置文件的一部分（autoscale-name 与配置文件的 name 匹配）。 condition 参数的值表示当“VM 实例中的平均 CPU 使用率在 5 分钟内超过 70%”时，规则应触发。满足该条件时，将分配两个以上的 VM 实例。

注意

有关 CLI 语法的详细信息，请参阅 az monitor autoscale。

适用于：Python SDK azure-ai-ml v2（当前版本）

创建规则定义：

rule_scale_out = ScaleRule(
    metric_trigger = MetricTrigger(
        metric_name="CpuUtilizationPercentage",
        metric_resource_uri = deployment.id, 
        time_grain = datetime.timedelta(minutes = 1),
        statistic = "Average",
        operator = "GreaterThan", 
        time_aggregation = "Last",
        time_window = datetime.timedelta(minutes = 5), 
        threshold = 70
    ), 
    scale_action = ScaleAction(
        direction = "Increase", 
        type = "ChangeCount", 
        value = 2, 
        cooldown = datetime.timedelta(hours = 1)
    )
)

此规则针对的是 metric_name、time_window、和 time_aggregation 中的 CPUUtilizationpercentage 最后 5 分钟的平均值。当该指标的值大于 70 的 threshold 时，系统会再分配两个 VM 实例。

更新 my-scale-settings 配置文件以包含此规则：

mon_client.autoscale_settings.create_or_update(
    resource_group, 
    autoscale_settings_name, 
    parameters = {
        "location" : endpoint.location,
        "target_resource_uri" : deployment.id,
        "profiles" : [
            AutoscaleProfile(
                name="my-scale-settings",
                capacity={
                    "minimum" : 2, 
                    "maximum" : 5,
                    "default" : 2
                },
                rules = [
                    rule_scale_out
                ]
            )
        ]
    }
)

使用指标创建横向缩减规则

当负载较轻时，横向缩减规则可以减少 VM 实例数目。如果 CPU 负载持续 5 分钟小于 30%，则以下示例将释放单个节点，但至少保留 2 个：

适用于：Azure CLI ml 扩展 v2（当前版本）

az monitor autoscale rule create \
  --autoscale-name $AUTOSCALE_SETTINGS_NAME \
  --condition "CpuUtilizationPercentage < 25 avg 5m" \
  --scale in 1

适用于：Python SDK azure-ai-ml v2（当前版本）

创建规则定义：

rule_scale_in = ScaleRule(
    metric_trigger = MetricTrigger(
        metric_name="CpuUtilizationPercentage",
        metric_resource_uri = deployment.id, 
        time_grain = datetime.timedelta(minutes = 1),
        statistic = "Average",
        operator = "LessThan", 
        time_aggregation = "Last",
        time_window = datetime.timedelta(minutes = 5), 
        threshold = 30
    ), 
    scale_action = ScaleAction(
        direction = "Increase", 
        type = "ChangeCount", 
        value = 1, 
        cooldown = datetime.timedelta(hours = 1)
    )
)

更新 my-scale-settings 配置文件以包含此规则：

mon_client.autoscale_settings.create_or_update(
    resource_group, 
    autoscale_settings_name, 
    parameters = {
        "location" : endpoint.location,
        "target_resource_uri" : deployment.id,
        "profiles" : [
            AutoscaleProfile(
                name="my-scale-settings",
                capacity={
                    "minimum" : 2, 
                    "maximum" : 5,
                    "default" : 2
                },
                rules = [
                    rule_scale_out, 
                    rule_scale_in
                ]
            )
        ]
    }
)

基于终结点指标创建缩放规则

上述规则将应用于部署。现在，请添加应用于终结点的规则。在此示例中，如果请求延迟持续 5 分钟大于 70 毫秒的平均值，则分配另一个节点。

适用于：Azure CLI ml 扩展 v2（当前版本）

az monitor autoscale rule create \
 --autoscale-name $AUTOSCALE_SETTINGS_NAME \
 --condition "RequestLatency > 70 avg 5m" \
 --scale out 1 \
 --resource $ENDPOINT_RESOURCE_ID

适用于：Python SDK azure-ai-ml v2（当前版本）

创建规则定义：

rule_scale_out_endpoint = ScaleRule(
    metric_trigger = MetricTrigger(
        metric_name="RequestLatency",
        metric_resource_uri = endpoint.id, 
        time_grain = datetime.timedelta(minutes = 1),
        statistic = "Average",
        operator = "GreaterThan", 
        time_aggregation = "Last",
        time_window = datetime.timedelta(minutes = 5), 
        threshold = 70
    ), 
    scale_action = ScaleAction(
        direction = "Increase", 
        type = "ChangeCount", 
        value = 1, 
        cooldown = datetime.timedelta(hours = 1)
    )
)

该规则的 metric_resource_uri 字段现在指向的是终结点，而不是部署。

更新 my-scale-settings 配置文件以包含此规则：

mon_client.autoscale_settings.create_or_update(
    resource_group, 
    autoscale_settings_name, 
    parameters = {
        "location" : endpoint.location,
        "target_resource_uri" : deployment.id,
        "profiles" : [
            AutoscaleProfile(
                name="my-scale-settings",
                capacity={
                    "minimum" : 2, 
                    "maximum" : 5,
                    "default" : 2
                },
                rules = [
                    rule_scale_out, 
                    rule_scale_in,
                    rule_scale_out_endpoint
                ]
            )
        ]
    }
)

基于计划创建缩放规则

还可以创建仅在特定日期或特定时间应用的规则。此示例将周末的节点计数设置为 2。

适用于：Azure CLI ml 扩展 v2（当前版本）

az monitor autoscale profile create \
  --name weekend-profile \
  --autoscale-name $AUTOSCALE_SETTINGS_NAME \
  --min-count 2 --count 2 --max-count 2 \
  --recurrence week sat sun --timezone "Pacific Standard Time"

适用于：Python SDK azure-ai-ml v2（当前版本）

mon_client.autoscale_settings.create_or_update(
    resource_group, 
    autoscale_settings_name, 
    parameters = {
        "location" : endpoint.location,
        "target_resource_uri" : deployment.id,
        "profiles" : [
            AutoscaleProfile(
                name="Default",
                capacity={
                    "minimum" : 2, 
                    "maximum" : 2,
                    "default" : 2
                },
                recurrence = Recurrence(
                    frequency = "Week", 
                    schedule = RecurrentSchedule(
                        time_zone = "Pacific Standard Time", 
                        days = ["Saturday", "Sunday"], 
                        hours = [], 
                        minutes = []
                    )
                )
            )
        ]
    }
)

删除资源

如果你不打算使用自己的部署，请将其删除：

适用于：Azure CLI ml 扩展 v2（当前版本）

# delete the autoscaling profile
az monitor autoscale delete -n "$AUTOSCALE_SETTINGS_NAME"

# delete the endpoint
az ml online-endpoint delete --name $ENDPOINT_NAME --yes --no-wait

适用于：Python SDK azure-ai-ml v2（当前版本）

mon_client.autoscale_settings.delete(
    resource_group, 
    autoscale_settings_name
)

ml_client.online_endpoints.begin_delete(endpoint_name)

后续步骤

若要详细了解如何使用 Azure Monitor 进行自动缩放，请参阅以下文章：

自动缩放联机终结点

先决条件

定义自动缩放配置文件

使用指标创建横向扩展规则

使用指标创建横向缩减规则

基于终结点指标创建缩放规则

基于计划创建缩放规则

删除资源

后续步骤

其他资源