您现在访问的是微软AZURE全球版技术文档网站,若需要访问由世纪互联运营的MICROSOFT AZURE中国区技术文档网站,请访问 https://docs.azure.cn.

将机器学习模型部署到 AzureDeploy machine learning models to Azure

了解如何将机器学习或深度学习模型作为 Web 服务部署在 Azure 云中。Learn how to deploy your machine learning or deep learning model as a web service in the Azure cloud. 也可以部署到 Azure IoT Edge 设备。You can also deploy to Azure IoT Edge devices.

无论你在何处部署模型,工作流都是类似的:The workflow is similar no matter where you deploy your model:

  1. 注册模型(可选,请参见下文)。Register the model (optional, see below).
  2. 准备推理配置(使用无代码部署的情况除外)。Prepare an inference configuration (unless using no-code deployment).
  3. 准备入口脚本(使用无代码部署的情况除外)。Prepare an entry script (unless using no-code deployment).
  4. 选择计算目标。Choose a compute target.
  5. 将模型部署到计算目标。Deploy the model to the compute target.
  6. 测试生成的 Web 服务。Test the resulting web service.

若要详细了解机器学习部署工作流中涉及的概念,请参阅使用 Azure 机器学习来管理、部署和监视模型For more information on the concepts involved in the machine learning deployment workflow, see Manage, deploy, and monitor models with Azure Machine Learning.


连接到工作区Connect to your workspace

按照 Azure CLI 文档中的说明设置订阅上下文Follow the directions in the Azure CLI documentation for setting your subscription context.

然后执行以下命令:Then do:

az ml workspace list --resource-group=<my resource group>

以便查看有权访问的工作区。to see the workspaces you have access to.

注册模型(可选)Register your model (optional)

已注册的模型是组成模型的一个或多个文件的逻辑容器。A registered model is a logical container for one or more files that make up your model. 例如,如果有一个存储在多个文件中的模型,则可以在工作区中将这些文件注册为单个模型。For example, if you have a model that's stored in multiple files, you can register them as a single model in the workspace. 注册这些文件后,可以下载或部署已注册的模型,并接收注册的所有文件。After you register the files, you can then download or deploy the registered model and receive all the files that you registered.


建议注册模型以进行版本跟踪,但这不是必需的。Registering a model for version tracking is recommended but not required. 如果要在不注册模型的情况下继续操作,则需要在 InferenceConfiginferenceconfig.json 中指定源目录,并确保模型位于该源目录中。If you would rather proceed without registering a model, you will need to specify a source directory in your InferenceConfig or inferenceconfig.json and ensure your model resides within that source directory.


注册模型时,请提供云位置(来自训练运行)或本地目录的路径。When you register a model, you provide the path of either a cloud location (from a training run) or a local directory. 此路径仅用于在注册过程中查找要上传的文件。This path is just to locate the files for upload as part of the registration process. 它不需要与入口脚本中使用的路径匹配。It doesn't need to match the path used in the entry script. 有关详细信息,请参阅在入口脚本中查找模型文件For more information, see Locate model files in your entry script.


在 Azure 机器学习工作室的“模型”页上使用“按 Tags 筛选”选项时,客户应该使用 TagName=TagValue(无空格),而不是使用 TagName : TagValueWhen using Filter by Tags option on the Models page of Azure Machine Learning Studio, instead of using TagName : TagValue customers should use TagName=TagValue (without space)

以下示例演示如何注册模型。The following examples demonstrate how to register a model.

通过 Azure ML 训练运行注册一个模型Register a model from an Azure ML training run

az ml model register -n sklearn_mnist  --asset-path outputs/sklearn_mnist_model.pkl  --experiment-name myexperiment --run-id myrunid --tag area=mnist


如果收到一条错误消息,指出未安装 ml 扩展,请使用以下命令进行安装:If you get an error message stating that the ml extension isn't installed, use the following command to install it:

az extension add -n azure-cli-ml

--asset-path 参数表示模型的云位置。The --asset-path parameter refers to the cloud location of the model. 本示例使用的是单个文件的路径。In this example, the path of a single file is used. 若要在模型注册中包含多个文件,请将 --asset-path 设置为包含文件的文件夹的路径。To include multiple files in the model registration, set --asset-path to the path of a folder that contains the files.

通过本地文件注册模型Register a model from a local file

az ml model register -n onnx_mnist -p mnist/model.onnx

若要在模型注册中包含多个文件,请将 -p 设置为包含文件的文件夹的路径。To include multiple files in the model registration, set -p to the path of a folder that contains the files.

有关 az ml model register 的详细信息,请参阅参考文档For more information on az ml model register, consult the reference documentation.

定义入口脚本Define an entry script

入口脚本接收提交到已部署 Web 服务的数据,并将此数据传递给模型。The entry script receives data submitted to a deployed web service and passes it to the model. 然后,该脚本接收模型返回的响应,并将该响应返回给客户端。It then takes the response returned by the model and returns that to the client. 该脚本特定于你的模型。The script is specific to your model. 它必须能够识别模型需要和返回的数据。It must understand the data that the model expects and returns.

需要在入口脚本中完成以下两项操作:The two things you need to accomplish in your entry script are:

  1. 加载模型(使用名为 init() 的函数)Loading your model (using a function called init())
  2. 对输入数据运行模型(使用名为 run() 的函数)Running your model on input data (using a function called run())

下面将详细介绍这些步骤。Let's go through these steps in detail.

编写 init()Writing init()

加载已注册的模型Loading a registered model

已注册的模型将存储在名为 AZUREML_MODEL_DIR 的环境变量所指向的路径上。Your registered models are stored at a path pointed to by an environment variable called AZUREML_MODEL_DIR. 有关确切目录结构的详细信息,请参阅在入口脚本中查找模型文件For more information on the exact directory structure, see Locate model files in your entry script

加载本地模型Loading a local model

如果你在选择不注册模型的情况下将模型作为源目录的一部分传递,则可以通过传递模型的相对于评分脚本的路径,像在本地一样读入它。If you opted against registering your model and passed your model as part of your source directory, you can read it in like you would locally, by passing the path to the model relative to your scoring script. 例如,如果目录结构如下:For example, if you had a directory structured as:

- source_dir
    - score.py
    - models
        - model1.onnx

可以使用以下 Python 代码加载模型:you could load your models with the following Python code:

import os

model = open(os.path.join('.', 'models', 'model1.onnx'))

编写 run()Writing run()

run() 在模型每次收到评分请求时执行,并预期请求的正文是一个结构如下的 JSON 文档:run() is executed every time your model receives a scoring request, and expects the body of the request to be a JSON document with the following structure:

    "data": <model-specific-data-structure>

run() 的输入是一个 Python 字符串,其中包含“data”键后面的任何内容。The input to run() is a Python string containing whatever follows the "data" key.

以下示例演示如何加载已注册的 scikit-learn 模型并使用 numpy 数据对其进行评分:The following example demonstrates how to load a registered scikit-learn model and score it with numpy data:

import json
import numpy as np
import os
from sklearn.externals import joblib

def init():
    global model
    model_path = os.path.join(os.getenv('AZUREML_MODEL_DIR'), 'sklearn_mnist_model.pkl')
    model = joblib.load(model_path)

def run(data):
        data = np.array(json.loads(data))
        result = model.predict(data)
        # You can return any data type, as long as it is JSON serializable.
        return result.tolist()
    except Exception as e:
        error = str(e)
        return error

有关更高级的示例,包括 Swagger 架构的自动生成和二进制(即映像)数据,请阅读有关高级入口脚本创作的文章For more advanced examples, including automatic Swagger schema generation and binary (i.e. image) data, read the article on advanced entry script authoring

定义推理配置Define an inference configuration

推理配置描述如何设置包含模型的 Web 服务。An inference configuration describes how to set up the web-service containing your model. 此配置稍后在部署模型时使用。It's used later, when you deploy the model.

最小推理配置可以编写为:A minimal inference configuration can be written as:

    "entryScript": "score.py",
    "sourceDirectory": "./working_dir"

这样就指定了机器学习部署将会使用 ./working_dir 目录中的 score.py 文件来处理传入请求。This specifies that the machine learning deployment will use the file score.py in the ./working_dir directory to process incoming requests.

有关推理配置的更详细讨论,请参阅此文See this article for a more thorough discussion of inference configurations.


若要详细了解如何将自定义 Docker 映像与推理配置结合使用,请参阅如何使用自定义 Docker 映像部署模型For information on using a custom Docker image with an inference configuration, see How to deploy a model using a custom Docker image.

选择计算目标Choose a compute target

用于托管模型的计算目标将影响部署的终结点的成本和可用性。The compute target you use to host your model will affect the cost and availability of your deployed endpoint. 使用此表选择合适的计算目标。Use this table to choose an appropriate compute target.

计算目标Compute target 用途Used for GPU 支持GPU support FPGA 支持FPGA support 说明Description
本地 web 服务Local web service 测试/调试Testing/debugging     用于有限的测试和故障排除。Use for limited testing and troubleshooting. 硬件加速依赖于本地系统中库的使用情况。Hardware acceleration depends on use of libraries in the local system.
Azure Kubernetes 服务 (AKS)Azure Kubernetes Service (AKS) 实时推理Real-time inference (Web 服务部署)Yes (web service deployment) Yes 用于大规模生产部署。Use for high-scale production deployments. 提供所部署服务的快速响应时间和自动缩放。Provides fast response time and autoscaling of the deployed service. 不支持通过 Azure 机器学习 SDK 进行群集自动缩放。Cluster autoscaling isn't supported through the Azure Machine Learning SDK. 若要更改 AKS 群集中的节点,请在 Azure 门户中使用 AKS 群集的 UI。To change the nodes in the AKS cluster, use the UI for your AKS cluster in the Azure portal.

在设计器中受支持。Supported in the designer.
Azure 容器实例Azure Container Instances 测试或开发Testing or development     用于需要小于 48 GB RAM 的基于 CPU 的小规模工作负载。Use for low-scale CPU-based workloads that require less than 48 GB of RAM.

在设计器中受支持。Supported in the designer.
Azure 机器学习计算群集Azure Machine Learning compute clusters 批处理 推理Batch inference (机器学习管道)Yes (machine learning pipeline)   对无服务器计算运行批量评分。Run batch scoring on serverless compute. 支持普通 VM 和低优先级 VM。Supports normal and low-priority VMs. 不支持用于实时推理。No support for realtime inference.


尽管计算目标(例如本地、Azure 机器学习计算和 Azure 机器学习计算群集)支持使用 GPU 进行定型和试验,但在部署为 Web 服务时,仅 AKS 支持使用 GPU 进行推理。Although compute targets like local, Azure Machine Learning compute, and Azure Machine Learning compute clusters support GPU for training and experimentation, using GPU for inference when deployed as a web service is supported only on AKS.

当使用机器学习管道进行评分时,仅 Azure 机器学习计算支持使用 GPU 进行推理。Using a GPU for inference when scoring with a machine learning pipeline is supported only on Azure Machine Learning compute.

选择群集 SKU 时,请先纵向扩展,然后横向扩展。从其 RAM 是模型所需量的 150% 的计算机开始,然后分析结果,找到具有所需性能的计算机。When choosing a cluster SKU, first scale up and then scale out. Start with a machine that has 150% of the RAM your model requires, profile the result and find a machine that has the performance you need. 了解这一信息后,增加计算机的数量,使其满足你的并发推理需求。Once you've learned that, increase the number of machines to fit your need for concurrent inference.


  • 容器实例仅适用于小于 1 GB 的小模型。Container instances are suitable only for small models less than 1 GB in size.
  • 使用单节点 AKS 群集对大型模型进行开发/测试。Use single-node AKS clusters for dev/test of larger models.

定义部署配置Define a deployment configuration

适用于部署配置的选项因所选计算目标而异。The options available for a deployment configuration differ depending on the compute target you choose.

deploymentconfig.json 文档中的条目对应于 LocalWebservice.deploy_configuration 的参数。The entries in the deploymentconfig.json document map to the parameters for LocalWebservice.deploy_configuration. 下表描述了 JSON 文档中的实体与方法参数之间的映射:The following table describes the mapping between the entities in the JSON document and the parameters for the method:

JSON 实体JSON entity 方法参数Method parameter 说明Description
computeType NANA 计算目标。The compute target. 对于本地目标,值必须是 localFor local targets, the value must be local.
port port 用于公开服务的 HTTP 终结点的本地端口。The local port on which to expose the service's HTTP endpoint.

此 JSON 是用于 CLI 的部署配置示例:This JSON is an example deployment configuration for use with the CLI:

    "computeType": "local",
    "port": 32267

有关详细信息,请参阅此参考For more information, see this reference.

部署机器学习模型Deploy your machine learning model

现在已准备好部署模型。You are now ready to deploy your model.

使用已注册的模型Using a registered model

如果在 Azure 机器学习工作区中注册了模型,请将“mymodel:1”替换为模型的名称及其版本号。If you registered your model in your Azure Machine Learning workspace, replace "mymodel:1" with the name of your model and its version number.

az ml model deploy -m mymodel:1 --ic inferenceconfig.json --dc deploymentconfig.json

使用本地模型Using a local model

如果不想注册模型,则可在 inferenceconfig.json 中传递 "sourceDirectory" 参数,以指定用于提供模型的本地目录。If you would prefer not to register your model, you can pass the "sourceDirectory" parameter in your inferenceconfig.json to specify a local directory from which to serve your model.

az ml model deploy --ic inferenceconfig.json --dc deploymentconfig.json

了解服务状态Understanding service state

在模型部署期间,当模型完全部署时,你可能会看到服务状态发生更改。During model deployment, you may see the service state change while it fully deploys.

下表描述了各种服务状态:The following table describes the different service states:

Webservice 状态Webservice state 说明Description 最终状态?Final state?
正在转换Transitioning 此服务正在进行部署。The service is in the process of deployment. No
不正常Unhealthy 此服务已部署,但当前无法访问。The service has deployed but is currently unreachable. No
不可安排Unschedulable 由于缺少资源,此时无法部署此服务。The service cannot be deployed at this time due to lack of resources. No
已失败Failed 由于出现错误或崩溃,服务未能部署。The service has failed to deploy due to an error or crash. Yes
正常Healthy 服务正常,终结点可用。The service is healthy and the endpoint is available. Yes


在部署时,会从 Azure 容器注册表 (ACR) 生成并加载用于计算目标的 Docker 映像。When deploying, Docker images for compute targets are built and loaded from Azure Container Registry (ACR). 在默认情况下,Azure 机器学习会创建一个使用“基本”服务层级的 ACR。By default, Azure Machine Learning creates an ACR that uses the basic service tier. 将工作区的 ACR 更改为“标准”或“高级”层级可能会减少生成映像并将其部署到计算目标所花费的时间。Changing the ACR for your workspace to standard or premium tier may reduce the time it takes to build and deploy images to your compute targets. 有关详细信息,请参阅 Azure 容器注册表服务层级For more information, see Azure Container Registry service tiers.


如果要将模型部署到 Azure Kubernetes Service (AKS) ,建议为该群集启用 Azure MonitorIf you are deploying a model to Azure Kubernetes Service (AKS), we advise you enable Azure Monitor for that cluster. 这将帮助你了解总体群集运行状况和资源使用情况。This will help you understand overall cluster health and resource usage. 你还可能会发现以下资源非常有用:You might also find the following resources useful:

如果尝试将模型部署到不正常或过载的群集,则应该会遇到问题。If you are trying to deploy a model to an unhealthy or overloaded cluster, it is expected to experience issues. 如果需要帮助排查 AKS 群集问题,请联系 AKS 支持。If you need help troubleshooting AKS cluster problems please contact AKS Support.

批量推理Batch inference

Azure 机器学习计算目标由 Azure 机器学习创建和管理。Azure Machine Learning Compute targets are created and managed by Azure Machine Learning. 它们可用于 Azure 机器学习管道中的批量预测。They can be used for batch prediction from Azure Machine Learning pipelines.

若要查看使用 Azure 机器学习计算进行批量推理的演练,请参阅如何运行批量预测For a walkthrough of batch inference with Azure Machine Learning Compute, see How to run batch predictions.

IoT Edge 推理IoT Edge inference

对部署到边缘的支持处于预览阶段。Support for deploying to the edge is in preview. 有关详细信息,请参阅将 Azure 机器学习部署为 IoT Edge 模块For more information, see Deploy Azure Machine Learning as an IoT Edge module.

删除资源Delete resources

若要删除已部署的 webservice,请使用 az ml service <name of webservice>To delete a deployed webservice, use az ml service <name of webservice>.

若要从工作区中删除已注册的模型,请使用 az ml model delete <model id>To delete a registered model from your workspace, use az ml model delete <model id>

详细了解如何删除 webservice删除模型Read more about deleting a webservice and deleting a model.

后续步骤Next steps