您现在访问的是微软AZURE全球版技术文档网站,若需要访问由世纪互联运营的MICROSOFT AZURE中国区技术文档网站,请访问 https://docs.azure.cn.

Azure 机器学习服务的工作原理:体系结构和概念How Azure Machine Learning service works: Architecture and concepts

了解体系结构、 概念和 Azure 机器学习服务的工作流。Learn about the architecture, concepts, and workflow for Azure Machine Learning service. 下图显示该服务的主要组件,以及使用该服务时的常规工作流:The major components of the service and the general workflow for using the service are shown in the following diagram:

Azure 机器学习服务体系结构和工作流Azure Machine Learning service architecture and workflow

工作流Workflow

机器学习工作流通常采用以下顺序:The machine learning workflow generally follows this sequence:

  1. 开发训练脚本中的机器学习Python或可视界面。Develop machine learning training scripts in Python or with the visual interface.
  2. 创建和配置计算目标Create and configure a compute target.
  3. 将脚本提交到配置的计算目标以在该环境中运行。Submit the scripts to the configured compute target to run in that environment. 在训练期间,脚本可以读取或写入数据存储During training, the scripts can read from or write to datastore. 并且执行记录在工作区中保存为运行,并在试验下分组。And the records of execution are saved as runs in the workspace and grouped under experiments.
  4. 查询试验了解当前和过去的运行中已记录的指标。Query the experiment for logged metrics from the current and past runs. 如果指标未指示所需结果,请循环回到步骤 1 并循环访问脚本。If the metrics don't indicate a desired outcome, loop back to step 1 and iterate on your scripts.
  5. 找到满意的运行后,在模型注册表中注册持久化模型。After a satisfactory run is found, register the persisted model in the model registry.
  6. 开发使用模型评分脚本和部署模型作为web 服务在 Azure 中,或设置为IoT Edge 设备Develop a scoring script that uses the model and Deploy the model as a web service in Azure, or to an IoT Edge device.

执行以下任一这些步骤:You perform these steps with any of the following:

术语表的概念Glossary of concepts

备注

本文定义了 Azure 机器学习服务使用的术语和概念,但未定义 Azure 平台的术语和概念。Although this article defines terms and concepts used by Azure Machine Learning service, it does not define terms and concepts for the Azure platform. 有关 Azure 平台术语的详细信息,请参阅 Microsoft Azure 词汇表For more information about Azure platform terminology, see the Microsoft Azure glossary.

工作区Workspaces

在工作区是 Azure 机器学习服务的顶级资源。The workspace is the top-level resource for Azure Machine Learning service. 它提供了一个集中的位置来处理使用 Azure 机器学习服务时创建的所有项目。It provides a centralized place to work with all the artifacts you create when you use Azure Machine Learning service.

下图演示了工作区的分类:A taxonomy of the workspace is illustrated in the following diagram:

工作区分类Workspace taxonomy

有关工作区的详细信息,请参阅什么是 Azure 机器学习工作区?For more information about workspaces, see What is an Azure Machine Learning workspace?.

试验Experiments

试验是指定的脚本中多个运行的分组。An experiment is a grouping of many runs from a specified script. 它始终属于工作区。It always belongs to a workspace. 当你提交运行时,需提供试验名称。When you submit a run, you provide an experiment name. 运行的信息存储在该试验下。Information for the run is stored under that experiment. 如果提交运行,并指定一个不存在的试验名称,则系统将使用新指定的名称自动创建一个新试验。If you submit a run and specify an experiment name that doesn't exist, a new experiment with that newly specified name is automatically created.

有关使用试验的示例,请参阅快速入门:Azure 机器学习服务入门For an example of using an experiment, see Quickstart: Get started with Azure Machine Learning service.

ModelsModels

简单地说,模型是一段接受输入并生成输出的代码。At its simplest, a model is a piece of code that takes an input and produces output. 创建机器学习模型将涉及选择算法、为其提供数据以及优化超参数。Creating a machine learning model involves selecting an algorithm, providing it with data, and tuning hyperparameters. 培训是一个迭代过程,将生成经过培训的模型,它会封装模型在培训过程中学到的内容。Training is an iterative process that produces a trained model, which encapsulates what the model learned during the training process.

模型通过 Azure 机器学习中的运行生成。A model is produced by a run in Azure Machine Learning. 还可以使用在 Azure 机器学习外部训练的模型。You can also use a model that's trained outside of Azure Machine Learning. 可在 Azure 机器学习服务工作区中注册模型。You can register a model in an Azure Machine Learning service workspace.

Azure 机器学习服务与框架无关。Azure Machine Learning service is framework agnostic. 创建模型时,可以使用任何常用的机器学习框架,例如 scikit-learn、 XGBoost、 PyTorch、 TensorFlow 和链接器。When you create a model, you can use any popular machine learning framework, such as Scikit-learn, XGBoost, PyTorch, TensorFlow, and Chainer.

为模型定型的示例,请参阅教程:使用 Azure 机器学习服务训练图像分类模型For an example of training a model, see Tutorial: Train an image classification model with Azure Machine Learning service.

模型注册表跟踪的 Azure 机器学习服务工作区中的所有模型。The model registry keeps track of all the models in your Azure Machine Learning service workspace.

模型按名称和版本标识。Models are identified by name and version. 每次使用与现有相同的名称注册模型时,注册表都会假定它是新版本。Each time you register a model with the same name as an existing one, the registry assumes that it's a new version. 该版本将递增并且新模型会以同一名称注册。The version is incremented, and the new model is registered under the same name.

注册模型时,可以提供其他元数据标记,然后在搜索模型时使用这些标记。When you register the model, you can provide additional metadata tags and then use the tags when you search for models.

不能删除正在使用的活动部署的模型。You can't delete models that are being used by an active deployment.

有关注册模型的示例,请参阅使用 Azure 机器学习训练映像分类模型For an example of registering a model, see Train an image classification model with Azure Machine Learning.

运行配置Run configurations

运行配置是一组指令,用于定义如何在指定的计算目标中运行脚本。A run configuration is a set of instructions that defines how a script should be run in a specified compute target. 该配置包括一组广泛的行为定义,例如,是使用现有 Python 环境还是使用根据规范构建的 Conda 环境。The configuration includes a wide set of behavior definitions, such as whether to use an existing Python environment or to use a Conda environment that's built from a specification.

运行配置可以保存到包含训练脚本的目录内的文件中,或构造为内存中对象以及用于提交运行。A run configuration can be persisted into a file inside the directory that contains your training script, or it can be constructed as an in-memory object and used to submit a run.

有关示例运行配置,请参阅选择并使用计算目标来训练模型For example run configurations, see Select and use a compute target to train your model.

数据集和数据存储Datasets and datastores

Azure 机器学习数据集(预览版) 使其更易于访问和使用你的数据。Azure Machine Learning Datasets (preview) make it easier to access and work with your data. 数据集管理在各种情况下,例如模型训练数据和创建的管道。Datasets manage data in various scenarios such as model training and pipeline creation. 使用 Azure 机器学习 SDK,可以访问基础存储、 探索和准备数据、 管理不同的数据集定义的生命周期和在培训和在生产环境中使用的数据集之间进行比较。Using the Azure Machine Learning SDK, you can access underlying storage, explore and prepare data, manage the life cycle of different Dataset definitions, and compare between Datasets used in training and in production.

数据集提供用于处理中常用的格式,例如,使用的数据的方法from_delimited_files()to_pandas_dataframe()Datasets provides methods for working with data in popular formats, such as using from_delimited_files() or to_pandas_dataframe().

有关详细信息,请参阅创建和注册 Azure 机器学习数据集For more information, see Create and register Azure Machine Learning Datasets. 使用数据集的更多示例,请参阅示例笔记本For more examples using Datasets, see the sample notebooks.

一个数据存储高于 Azure 存储帐户的存储抽象。A datastore is a storage abstraction over an Azure storage account. 数据存储可以使用 Azure blob 容器或 Azure 文件共享作为后端存储。The datastore can use either an Azure blob container or an Azure file share as the back-end storage. 每个工作区都有默认数据存储,并且你可以注册其他数据存储。Each workspace has a default datastore, and you can register additional datastores. 使用 Python SDK API 或 Azure 机器学习 CLI 可从数据存储中存储和检索文件。Use the Python SDK API or the Azure Machine Learning CLI to store and retrieve files from the datastore.

计算目标Compute targets

一个计算目标使你能够指定运行训练脚本或主机服务部署的计算资源。A compute target lets you to specify the compute resource where you run your training script or host your service deployment. 此位置可能是在本地计算机或基于云的计算资源。This location may be your local machine or a cloud-based compute resource. 计算目标,使其可以轻松地更改您的计算环境而无需更改你的代码。Compute targets make it easy to change your compute environment without changing your code.

详细了解如何培训和部署的可用的计算目标Learn more about the available compute targets for training and deployment.

定型脚本Training scripts

若要定型模型,你可以指定包含培训脚本和关联文件的目录。To train a model, you specify the directory that contains the training script and associated files. 此外,还可指定一个试验名称,用于存储在训练期间收集的信息。You also specify an experiment name, which is used to store information that's gathered during training. 在训练期间,会将整个目录复制到训练环境(计算目标),并启动运行配置指定的脚本。During training, the entire directory is copied to the training environment (compute target), and the script that's specified by the run configuration is started. 目录的快照同样存储在工作区中的试验下。A snapshot of the directory is also stored under the experiment in the workspace.

有关示例,请参阅教程:使用 Azure 机器学习服务训练图像分类模型For an example, see Tutorial: Train an image classification model with Azure Machine Learning service.

运行次数Runs

运行是包含以下信息的记录:A run is a record that contains the following information:

  • 有关运行的元数据(时间戳、持续时间等)Metadata about the run (timestamp, duration, and so on)
  • 脚本记录的指标Metrics that are logged by your script
  • 试验自动收集的或由你显式上传的输出文件Output files that are autocollected by the experiment or explicitly uploaded by you
  • 在运行之前包含脚本的目录的快照A snapshot of the directory that contains your scripts, prior to the run

提交脚本以训练模型时,会生成运行。You produce a run when you submit a script to train a model. 运行可以有零次或多次子级运行。A run can have zero or more child runs. 例如,顶级运行可以有两次子级运行,其中每个可以有其自己的子级运行。For example, the top-level run might have two child runs, each of which might have its own child run.

有关查看由训练模型产生的运行次数的示例,请参阅快速入门:Azure 机器学习服务入门For an example of viewing runs that are produced by training a model, see Quickstart: Get started with Azure Machine Learning service.

GitHub 跟踪和集成GitHub tracking and integration

启动时运行,其中源目录是本地 Git 存储库的培训,存储库有关的信息存储在运行历史记录。When you start a training run where the source directory is a local Git repository, information about the repository is stored in the run history. 例如,在存储库的当前提交 ID 记录为历史记录。For example, the current commit ID for the repository is logged as part of the history. 这适用于运行提交使用估计器、 机器学习管道中或运行脚本。This works with runs submitted using an estimator, ML pipeline, or script run. 它还适用于运行从 SDK 或机器学习 CLI 提交。It also works for runs submitted from the SDK or Machine Learning CLI.

快照Snapshots

提交运行时,Azure 机器学习会将包含该脚本的目录压缩为 zip 文件并将其发送到计算目标。When you submit a run, Azure Machine Learning compresses the directory that contains the script as a zip file and sends it to the compute target. 然后解压缩 zip 文件并运行脚本。The zip file is then extracted, and the script is run there. Azure 机器学习还将该 zip 文件存储为快照,作为运行记录的一部分。Azure Machine Learning also stores the zip file as a snapshot as part of the run record. 有权限访问工作区的任何用户都可以浏览运行记录并下载快照。Anyone with access to the workspace can browse a run record and download the snapshot.

备注

若要防止不必要的文件包含在快照,请忽略文件 (.gitignore 或.amlignore)。To prevent unnecessary files from being included in the snapshot, make an ignore file (.gitignore or .amlignore). 将此文件放置在快照目录中并添加要忽略在它的文件名。Place this file in the Snapshot directory and add the filenames to ignore in it. .Amlignore 文件使用相同语法和模式为.gitignore 文件The .amlignore file uses the same syntax and patterns as the .gitignore file. 如果这两个文件存在,.amlignore 文件优先。If both files exist, the .amlignore file takes precedence.

activitiesActivities

活动表示长时间运行的操作。An activity represents a long running operation. 以下操作是活动的示例:The following operations are examples of activities:

  • 创建或删除计算目标Creating or deleting a compute target
  • 在计算目标上运行脚本Running a script on a compute target

活动可通过 SDK 或 Web UI 提供通知,使你能够轻松监视这些操作的进度。Activities can provide notifications through the SDK or the web UI so that you can easily monitor the progress of these operations.

映像Images

映像提供了一种可靠地部署模型的方法,以及使用该模型所需的所有组件。Images provide a way to reliably deploy a model, along with all components you need to use the model. 映像包含以下项:An image contains the following items:

  • 模型。A model.
  • 评分脚本或应用程序。A scoring script or application. 使用此脚本可将输入传递到模型,并返回模型的输出。You use the script to pass input to the model and return the output of the model.
  • 模型或评分脚本/应用程序所需的依赖项。The dependencies that are needed by the model or scoring script or application. 例如,你可能包括列出 Python 包依赖项的 Conda 环境文件。For example, you might include a Conda environment file that lists Python package dependencies.

Azure 机器学习可以创建两种类型的映像:Azure Machine Learning can create two types of images:

  • FPGA 映像:部署到 Azure 中的现场可编程门阵列时使用。FPGA image: Used when you deploy to a field-programmable gate array in Azure.
  • Docker 映像:部署到 FPGA 以外的计算目标时使用。Docker image: Used when you deploy to compute targets other than FPGA. 例如,部署到 Azure 容器实例和 Azure Kubernetes 服务时。Examples are Azure Container Instances and Azure Kubernetes Service.

Azure 机器学习服务提供了一个基本映像,默认情况下使用。The Azure Machine Learning service provides a base image, which is used by default. 你还可以提供自己的自定义映像。You can also provide your own custom images.

映像注册表Image registry

映像在编录映像注册表工作区中。Images are cataloged in the image registry in your workspace. 以便您可以查询这些更高版本查找你的映像,可以创建映像时提供的其他元数据标记。You can provide additional metadata tags when you create the image, so that you can query them to find your image later.

有关创建映像的示例,请参阅在 Azure 容器实例中部署映像分类模型For an example of creating an image, see Deploy an image classification model in Azure Container Instances.

使用自定义映像部署模型的示例,请参阅如何部署使用自定义 Docker 映像的模型For an example of deploying a model using a custom image, see How to deploy a model using a custom Docker image.

部署Deployment

部署是模型的为 web 服务,可以托管在云中或 IoT 模块集成的设备部署到您的实例化。A deployment is an instantiation of your model into either a web service that can be hosted in the cloud or an IoT module for integrated device deployments.

Web 服务部署Web service deployments

已部署的 Web 服务可以使用 Azure 容器实例、Azure Kubernetes 服务或 FPGA。A deployed web service can use Azure Container Instances, Azure Kubernetes Service, or FPGAs. 从模型、 脚本和关联的文件创建的服务。You create the service from your model, script, and associated files. 这些被封装在一个映像,而提供的 web 服务的运行的时环境中。These are encapsulated in an image, which provides the run time environment for the web service. 映像具有负载均衡的 HTTP 终结点,可接收发送到 Web 服务的评分请求。The image has a load-balanced, HTTP endpoint that receives scoring requests that are sent to the web service.

如果已选择启用此功能,Azure 可通过收集 Application Insight 遥测数据或模型遥测数据帮助监视 Web 服务部署。Azure helps you monitor your web service deployment by collecting Application Insights telemetry or model telemetry, if you've chosen to enable this feature. 遥测数据仅供你访问,并且存储在 Application Insights 和存储帐户实例中。The telemetry data is accessible only to you, and it's stored in your Application Insights and storage account instances.

如果已启用自动缩放,Azure 将自动缩放部署。If you've enabled automatic scaling, Azure automatically scales your deployment.

有关将模型部署为 Web 服务的示例,请参阅在 Azure 容器实例中部署映像分类模型For an example of deploying a model as a web service, see Deploy an image classification model in Azure Container Instances.

IoT 模块部署IoT module deployments

已部署 IoT 模块是一个 Docker 容器,包括模型和关联脚本或应用程序,以及任何其他依赖项。A deployed IoT module is a Docker container that includes your model and associated script or application and any additional dependencies. 在 edge 设备上使用 Azure IoT Edge 部署这些模块。You deploy these modules by using Azure IoT Edge on edge devices.

如果已启用监视,Azure 会从 Azure IoT Edge 模块内的模型中收集遥测数据。If you've enabled monitoring, Azure collects telemetry data from the model inside the Azure IoT Edge module. 遥测数据仅供你访问,并且存储在存储帐户实例中。The telemetry data is accessible only to you, and it's stored in your storage account instance.

Azure IoT Edge 将确保模块正在运行并且监视托管它的设备。Azure IoT Edge ensures that your module is running, and it monitors the device that's hosting it.

ML 管道ML Pipelines

使用机器学习管道可以创建和管理将各个机器学习阶段整合到一起的工作流。You use machine learning pipelines to create and manage workflows that stitch together machine learning phases. 例如,管道可能包括数据准备、 模型定型、 模型部署和评分推理/阶段。For example, a pipeline might include data preparation, model training, model deployment, and inference/scoring phases. 每个阶段可以包含多个步骤,每个步骤都能够以无人参与方式在各种计算目标中运行。Each phase can encompass multiple steps, each of which can run unattended in various compute targets.

有关机器学习管道与此服务的详细信息,请参阅管道和 Azure 机器学习For more information about machine learning pipelines with this service, see Pipelines and Azure Machine Learning.

日志记录Logging

开发解决方案时,请在 Python 脚本中使用 Azure 机器学习 Python SDK 记录任意指标。When you develop your solution, use the Azure Machine Learning Python SDK in your Python script to log arbitrary metrics. 运行后,查询指标以确定运行是否生成了要部署的模型。After the run, query the metrics to determine whether the run has produced the model you want to deploy.

后续步骤Next steps

若要开始使用 Azure 机器学习服务,请参阅:To get started with Azure Machine Learning service, see: