Deploy machine learning models to Azure
Learn how to deploy your machine learning or deep learning model as a web service in the Azure cloud. You can also deploy to Azure IoT Edge devices.
The workflow is similar no matter where you deploy your model:
- Register the model (optional, see below).
- Prepare an inference configuration (unless using no-code deployment).
- Prepare an entry script (unless using no-code deployment).
- Choose a compute target.
- Deploy the model to the compute target.
- Test the resulting web service.
For more information on the concepts involved in the machine learning deployment workflow, see Manage, deploy, and monitor models with Azure Machine Learning.
Prerequisites
- An Azure Machine Learning workspace. For more information, see Create an Azure Machine Learning workspace.
- A model. If you don't have a trained model, you can use the model and dependency files provided in this tutorial.
- The Azure Command Line Interface (CLI) extension for the Machine Learning service.
Connect to your workspace
Follow the directions in the Azure CLI documentation for setting your subscription context.
Then do:
az ml workspace list --resource-group=<my resource group>
to see the workspaces you have access to.
Register your model (optional)
A registered model is a logical container for one or more files that make up your model. For example, if you have a model that's stored in multiple files, you can register them as a single model in the workspace. After you register the files, you can then download or deploy the registered model and receive all the files that you registered.
Tip
Registering a model for version tracking is recommended but not required. If you would rather proceed without registering a model, you will need to specify a source directory in your InferenceConfig or inferenceconfig.json and ensure your model resides within that source directory.
Tip
When you register a model, you provide the path of either a cloud location (from a training run) or a local directory. This path is just to locate the files for upload as part of the registration process. It doesn't need to match the path used in the entry script. For more information, see Locate model files in your entry script.
Important
When using Filter by Tags
option on the Models page of Azure Machine Learning Studio, instead of using TagName : TagValue
customers should use TagName=TagValue
(without space)
The following examples demonstrate how to register a model.
Register a model from an Azure ML training run
az ml model register -n sklearn_mnist --asset-path outputs/sklearn_mnist_model.pkl --experiment-name myexperiment --run-id myrunid --tag area=mnist
Tip
If you get an error message stating that the ml extension isn't installed, use the following command to install it:
az extension add -n azure-cli-ml
The --asset-path
parameter refers to the cloud location of the model. In this example, the path of a single file is used. To include multiple files in the model registration, set --asset-path
to the path of a folder that contains the files.
Register a model from a local file
az ml model register -n onnx_mnist -p mnist/model.onnx
To include multiple files in the model registration, set -p
to the path of a folder that contains the files.
For more information on az ml model register
, consult the reference documentation.
Define an entry script
The entry script receives data submitted to a deployed web service and passes it to the model. It then takes the response returned by the model and returns that to the client. The script is specific to your model. It must understand the data that the model expects and returns.
The two things you need to accomplish in your entry script are:
- Loading your model (using a function called
init()
) - Running your model on input data (using a function called
run()
)
Let's go through these steps in detail.
Writing init()
Loading a registered model
Your registered models are stored at a path pointed to by an environment variable called AZUREML_MODEL_DIR
. For more information on the exact directory structure, see Locate model files in your entry script
Loading a local model
If you opted against registering your model and passed your model as part of your source directory, you can read it in like you would locally, by passing the path to the model relative to your scoring script. For example, if you had a directory structured as:
- source_dir
- score.py
- models
- model1.onnx
you could load your models with the following Python code:
import os
model = open(os.path.join('.', 'models', 'model1.onnx'))
Writing run()
run()
is executed every time your model receives a scoring request, and expects the body of the request to be a JSON document with the following structure:
{
"data": <model-specific-data-structure>
}
The input to run()
is a Python string containing whatever follows the "data" key.
The following example demonstrates how to load a registered scikit-learn model and score it with numpy data:
import json
import numpy as np
import os
from sklearn.externals import joblib
def init():
global model
model_path = os.path.join(os.getenv('AZUREML_MODEL_DIR'), 'sklearn_mnist_model.pkl')
model = joblib.load(model_path)
def run(data):
try:
data = np.array(json.loads(data))
result = model.predict(data)
# You can return any data type, as long as it is JSON serializable.
return result.tolist()
except Exception as e:
error = str(e)
return error
For more advanced examples, including automatic Swagger schema generation and binary (i.e. image) data, read the article on advanced entry script authoring
Define an inference configuration
An inference configuration describes how to set up the web-service containing your model. It's used later, when you deploy the model.
A minimal inference configuration can be written as:
{
"entryScript": "score.py",
"sourceDirectory": "./working_dir"
}
This specifies that the machine learning deployment will use the file score.py
in the ./working_dir
directory to process incoming requests.
See this article for a more thorough discussion of inference configurations.
Tip
For information on using a custom Docker image with an inference configuration, see How to deploy a model using a custom Docker image.
Choose a compute target
The compute target you use to host your model will affect the cost and availability of your deployed endpoint. Use this table to choose an appropriate compute target.
Compute target | Used for | GPU support | FPGA support | Description |
---|---|---|---|---|
Local web service | Testing/debugging | Use for limited testing and troubleshooting. Hardware acceleration depends on use of libraries in the local system. | ||
Azure Kubernetes Service (AKS) | Real-time inference | Yes (web service deployment) | Yes | Use for high-scale production deployments. Provides fast response time and autoscaling of the deployed service. Cluster autoscaling isn't supported through the Azure Machine Learning SDK. To change the nodes in the AKS cluster, use the UI for your AKS cluster in the Azure portal. Supported in the designer. |
Azure Container Instances | Testing or development | Use for low-scale CPU-based workloads that require less than 48 GB of RAM. Supported in the designer. |
||
Azure Machine Learning compute clusters | Batch inference | Yes (machine learning pipeline) | Run batch scoring on serverless compute. Supports normal and low-priority VMs. No support for realtime inference. |
Note
Although compute targets like local, Azure Machine Learning compute, and Azure Machine Learning compute clusters support GPU for training and experimentation, using GPU for inference when deployed as a web service is supported only on AKS.
Using a GPU for inference when scoring with a machine learning pipeline is supported only on Azure Machine Learning compute.
When choosing a cluster SKU, first scale up and then scale out. Start with a machine that has 150% of the RAM your model requires, profile the result and find a machine that has the performance you need. Once you've learned that, increase the number of machines to fit your need for concurrent inference.
Note
- Container instances are suitable only for small models less than 1 GB in size.
- Use single-node AKS clusters for dev/test of larger models.
Define a deployment configuration
The options available for a deployment configuration differ depending on the compute target you choose.
The entries in the deploymentconfig.json
document map to the parameters for LocalWebservice.deploy_configuration. The following table describes the mapping between the entities in the JSON document and the parameters for the method:
JSON entity | Method parameter | Description |
---|---|---|
computeType |
NA | The compute target. For local targets, the value must be local . |
port |
port |
The local port on which to expose the service's HTTP endpoint. |
This JSON is an example deployment configuration for use with the CLI:
{
"computeType": "local",
"port": 32267
}
For more information, see this reference.
Deploy your machine learning model
You are now ready to deploy your model.
Using a registered model
If you registered your model in your Azure Machine Learning workspace, replace "mymodel:1" with the name of your model and its version number.
az ml model deploy -m mymodel:1 --ic inferenceconfig.json --dc deploymentconfig.json
Using a local model
If you would prefer not to register your model, you can pass the "sourceDirectory" parameter in your inferenceconfig.json to specify a local directory from which to serve your model.
az ml model deploy --ic inferenceconfig.json --dc deploymentconfig.json
Understanding service state
During model deployment, you may see the service state change while it fully deploys.
The following table describes the different service states:
Webservice state | Description | Final state? |
---|---|---|
Transitioning | The service is in the process of deployment. | No |
Unhealthy | The service has deployed but is currently unreachable. | No |
Unschedulable | The service cannot be deployed at this time due to lack of resources. | No |
Failed | The service has failed to deploy due to an error or crash. | Yes |
Healthy | The service is healthy and the endpoint is available. | Yes |
Tip
When deploying, Docker images for compute targets are built and loaded from Azure Container Registry (ACR). By default, Azure Machine Learning creates an ACR that uses the basic service tier. Changing the ACR for your workspace to standard or premium tier may reduce the time it takes to build and deploy images to your compute targets. For more information, see Azure Container Registry service tiers.
Note
If you are deploying a model to Azure Kubernetes Service (AKS), we advise you enable Azure Monitor for that cluster. This will help you understand overall cluster health and resource usage. You might also find the following resources useful:
If you are trying to deploy a model to an unhealthy or overloaded cluster, it is expected to experience issues. If you need help troubleshooting AKS cluster problems please contact AKS Support.
Batch inference
Azure Machine Learning Compute targets are created and managed by Azure Machine Learning. They can be used for batch prediction from Azure Machine Learning pipelines.
For a walkthrough of batch inference with Azure Machine Learning Compute, see How to run batch predictions.
IoT Edge inference
Support for deploying to the edge is in preview. For more information, see Deploy Azure Machine Learning as an IoT Edge module.
Delete resources
To delete a deployed webservice, use az ml service <name of webservice>
.
To delete a registered model from your workspace, use az ml model delete <model id>
Read more about deleting a webservice and deleting a model.
Next steps
- Troubleshoot a failed deployment
- Deploy to Azure Kubernetes Service
- Create client applications to consume web services
- Update web service
- How to deploy a model using a custom Docker image
- Use TLS to secure a web service through Azure Machine Learning
- Monitor your Azure Machine Learning models with Application Insights
- Collect data for models in production
- Create event alerts and triggers for model deployments