How Azure Machine Learning service works: Architecture and concepts
Learn about the architecture, concepts, and workflow for Azure Machine Learning service. The major components of the service and the general workflow for using the service are shown in the following diagram:
The machine learning workflow generally follows this sequence:
- Develop machine learning training scripts in Python.
- Create and configure a compute target.
- Submit the scripts to the configured compute target to run in that environment. During training, the scripts can read from or write to datastore. And the records of execution are saved as runs in the workspace and grouped under experiments.
- Query the experiment for logged metrics from the current and past runs. If the metrics don't indicate a desired outcome, loop back to step 1 and iterate on your scripts.
- After a satisfactory run is found, register the persisted model in the model registry.
- Develop a scoring script that uses the model and Deploy the model as a web service in Azure, or to an IoT Edge device.
You perform these steps with any of the following:
- Azure Machine Learning SDK for Python
- Azure Machine Learning CLI
- The visual interface (preview) for Azure Machine Learning service
Although this article defines terms and concepts used by Azure Machine Learning service, it does not define terms and concepts for the Azure platform. For more information about Azure platform terminology, see the Microsoft Azure glossary.
The workspace is the top-level resource for Azure Machine Learning service. It provides a centralized place to work with all the artifacts you create when you use Azure Machine Learning service.
A taxonomy of the workspace is illustrated in the following diagram:
For more information about workspaces, see What is an Azure Machine Learning workspace?.
An experiment is a grouping of many runs from a specified script. It always belongs to a workspace. When you submit a run, you provide an experiment name. Information for the run is stored under that experiment. If you submit a run and specify an experiment name that doesn't exist, a new experiment with that newly specified name is automatically created.
For an example of using an experiment, see Quickstart: Get started with Azure Machine Learning service.
At its simplest, a model is a piece of code that takes an input and produces output. Creating a machine learning model involves selecting an algorithm, providing it with data, and tuning hyperparameters. Training is an iterative process that produces a trained model, which encapsulates what the model learned during the training process.
A model is produced by a run in Azure Machine Learning. You can also use a model that's trained outside of Azure Machine Learning. You can register a model in an Azure Machine Learning service workspace.
Azure Machine Learning service is framework agnostic. When you create a model, you can use any popular machine learning framework, such as Scikit-learn, XGBoost, PyTorch, TensorFlow, and Chainer.
For an example of training a model, see Tutorial: Train an image classification model with Azure Machine Learning service.
The model registry keeps track of all the models in your Azure Machine Learning service workspace.
Models are identified by name and version. Each time you register a model with the same name as an existing one, the registry assumes that it's a new version. The version is incremented, and the new model is registered under the same name.
When you register the model, you can provide additional metadata tags and then use the tags when you search for models.
You can't delete models that are being used by an active deployment.
For an example of registering a model, see Train an image classification model with Azure Machine Learning.
A run configuration is a set of instructions that defines how a script should be run in a specified compute target. The configuration includes a wide set of behavior definitions, such as whether to use an existing Python environment or to use a Conda environment that's built from a specification.
A run configuration can be persisted into a file inside the directory that contains your training script, or it can be constructed as an in-memory object and used to submit a run.
For example run configurations, see Select and use a compute target to train your model.
Azure Machine Learning Datasets (preview) make it easier to access and work with your data. Datasets manage data in various scenarios such as model training and pipeline creation. Using the Azure Machine Learning SDK, you can access underlying storage, explore and prepare data, manage the life cycle of different Dataset definitions, and compare between Datasets used in training and in production.
Datasets provides methods for working with data in popular formats, such as using
For more information, see Create and register Azure Machine Learning Datasets.
For an example of using Datasets, see the sample notebooks.
A datastore is a storage abstraction over an Azure storage account. The datastore can use either an Azure blob container or an Azure file share as the back-end storage. Each workspace has a default datastore, and you can register additional datastores.
Use the Python SDK API or the Azure Machine Learning CLI to store and retrieve files from the datastore.
A compute target is the compute resource that you use to run your training script or host your service deployment. The supported compute targets are:
|Your local computer||✓|
|Azure Machine Learning compute||✓|
|A Linux VM in Azure
(such as the Data Science Virtual Machine)
|Azure Data Lake Analytics||✓|
|Apache Spark for HDInsight||✓|
|Azure Container Instances||✓|
|Azure Kubernetes Service||✓|
|Azure IoT Edge||✓|
|Field-programmable gate array (FPGA)||✓|
Compute targets are attached to a workspace. Compute targets other than the local machine are shared by users of the workspace.
Managed and unmanaged compute targets
Managed: Compute targets that are created and managed by Azure Machine Learning service. These compute targets are optimized for machine learning workloads. Azure Machine Learning compute is the only managed compute target as of December 4, 2018. Additional managed compute targets may be added in the future.
You can create machine learning compute instances directly through the workspace by using the Azure portal, the Azure Machine Learning SDK, or the Azure CLI. All other compute targets must be created outside the workspace and then attached to it.
Unmanaged: Compute targets that are not managed by Azure Machine Learning service. You might need to create them outside Azure Machine Learning and then attach them to your workspace before use. Unmanaged compute targets can require additional steps for you to maintain or to improve performance for machine learning workloads.
For information about selecting a compute target for training, see Select and use a compute target to train your model.
For information about selecting a compute target for deployment, see the Deploy models with Azure Machine Learning service.
To train a model, you specify the directory that contains the training script and associated files. You also specify an experiment name, which is used to store information that's gathered during training. During training, the entire directory is copied to the training environment (compute target), and the script that's specified by the run configuration is started. A snapshot of the directory is also stored under the experiment in the workspace.
For an example, see Tutorial: Train an image classification model with Azure Machine Learning service.
A run is a record that contains the following information:
- Metadata about the run (timestamp, duration, and so on)
- Metrics that are logged by your script
- Output files that are autocollected by the experiment or explicitly uploaded by you
- A snapshot of the directory that contains your scripts, prior to the run
You produce a run when you submit a script to train a model. A run can have zero or more child runs. For example, the top-level run might have two child runs, each of which might have its own child run.
For an example of viewing runs that are produced by training a model, see Quickstart: Get started with Azure Machine Learning service.
GitHub tracking and integration
When you start a training run where the source directory is a local Git repository, information about the repository is stored in the run history. For example, the current commit ID for the repository is logged as part of the history. This works with runs submitted using an estimator, ML pipeline, or script run. It also works for runs submitted from the SDK or Machine Learning CLI.
When you submit a run, Azure Machine Learning compresses the directory that contains the script as a zip file and sends it to the compute target. The zip file is then extracted, and the script is run there. Azure Machine Learning also stores the zip file as a snapshot as part of the run record. Anyone with access to the workspace can browse a run record and download the snapshot.
An activity represents a long running operation. The following operations are examples of activities:
- Creating or deleting a compute target
- Running a script on a compute target
Activities can provide notifications through the SDK or the web UI so that you can easily monitor the progress of these operations.
Images provide a way to reliably deploy a model, along with all components you need to use the model. An image contains the following items:
- A model.
- A scoring script or application. You use the script to pass input to the model and return the output of the model.
- The dependencies that are needed by the model or scoring script or application. For example, you might include a Conda environment file that lists Python package dependencies.
Azure Machine Learning can create two types of images:
- FPGA image: Used when you deploy to a field-programmable gate array in Azure.
- Docker image: Used when you deploy to compute targets other than FPGA. Examples are Azure Container Instances and Azure Kubernetes Service.
The Azure Machine Learning service provides a base image, which is used by default. You can also provide your own custom images.
For an example of creating an image, see Deploy an image classification model in Azure Container Instances.
The image registry keeps track of images that are created from your models. You can provide additional metadata tags when you create the image. Metadata tags are stored by the image registry, and you can query them to find your image.
A deployment is an instantiation of your model into either a web service that can be hosted in the cloud or an IoT module for integrated device deployments.
A deployed web service can use Azure Container Instances, Azure Kubernetes Service, or FPGAs. You create the service from your model, script, and associated files. These are encapsulated in an image, which provides the run time environment for the web service. The image has a load-balanced, HTTP endpoint that receives scoring requests that are sent to the web service.
Azure helps you monitor your web service deployment by collecting Application Insights telemetry or model telemetry, if you've chosen to enable this feature. The telemetry data is accessible only to you, and it's stored in your Application Insights and storage account instances.
If you've enabled automatic scaling, Azure automatically scales your deployment.
For an example of deploying a model as a web service, see Deploy an image classification model in Azure Container Instances.
A deployed IoT module is a Docker container that includes your model and associated script or application and any additional dependencies. You deploy these modules by using Azure IoT Edge on edge devices.
If you've enabled monitoring, Azure collects telemetry data from the model inside the Azure IoT Edge module. The telemetry data is accessible only to you, and it's stored in your storage account instance.
Azure IoT Edge ensures that your module is running, and it monitors the device that's hosting it.
You use machine learning pipelines to create and manage workflows that stitch together machine learning phases. For example, a pipeline might include data preparation, model training, model deployment, and inference/scoring phases. Each phase can encompass multiple steps, each of which can run unattended in various compute targets.
For more information about machine learning pipelines with this service, see Pipelines and Azure Machine Learning.
When you develop your solution, use the Azure Machine Learning Python SDK in your Python script to log arbitrary metrics. After the run, query the metrics to determine whether the run has produced the model you want to deploy.
To get started with Azure Machine Learning service, see:
Send feedback about: