Deploy models with Azure Machine Learning

APPLIES TO: yesBasic edition yesEnterprise edition                    (Upgrade to Enterprise edition)

Learn how to deploy your machine learning model as a web service in the Azure cloud or to Azure IoT Edge devices.

The workflow is similar no matter where you deploy your model:

  1. Register the model.
  2. Prepare an inference configuration
  3. Prepare an entry script (unless using no-code deployment)
  4. Deploy the model to the compute target.
  5. Test the deployed model, also called a web service.

For more information on the concepts involved in the deployment workflow, see Manage, deploy, and monitor models with Azure Machine Learning.

Prerequisites

Connect to your workspace

Follow the directions in the Azure CLI documentation for setting your subscription context.

Then do:

az ml workspace list --resource-group=<my resource group>

to see the workspaces you have access to.

Register your model

A registered model is a logical container for one or more files that make up your model. For example, if you have a model that's stored in multiple files, you can register them as a single model in the workspace. After you register the files, you can then download or deploy the registered model and receive all the files that you registered.

Tip

When you register a model, you provide the path of either a cloud location (from a training run) or a local directory. This path is just to locate the files for upload as part of the registration process. It doesn't need to match the path used in the entry script. For more information, see Locate model files in your entry script.

Machine learning models are registered in your Azure Machine Learning workspace. The model can come from Azure Machine Learning or from somewhere else. When registering a model, you can optionally provide metadata about the model. The tags and properties dictionaries that you apply to a model registration can then be used to filter models.

The following examples demonstrate how to register a model.

Register a model from an Azure ML training run

az ml model register -n sklearn_mnist  --asset-path outputs/sklearn_mnist_model.pkl  --experiment-name myexperiment --run-id myrunid --tag area=mnist

Tip

If you get an error message stating that the ml extension isn't installed, use the following command to install it:

az extension add -n azure-cli-ml

The --asset-path parameter refers to the cloud location of the model. In this example, the path of a single file is used. To include multiple files in the model registration, set --asset-path to the path of a folder that contains the files.

Register a model from a local file

az ml model register -n onnx_mnist -p mnist/model.onnx

To include multiple files in the model registration, set -p to the path of a folder that contains the files.

For more information on az ml model register, consult the reference documentation.

Define an entry script

The entry script receives data submitted to a deployed web service and passes it to the model. It then takes the response returned by the model and returns that to the client. The script is specific to your model. It must understand the data that the model expects and returns.

The script contains two functions that load and run the model:

  • init(): Typically, this function loads the model into a global object. This function is run only once, when the Docker container for your web service is started.

  • run(input_data): This function uses the model to predict a value based on the input data. Inputs and outputs of the run typically use JSON for serialization and deserialization. You can also work with raw binary data. You can transform the data before sending it to the model or before returning it to the client.

The REST API expects the body of the request to be a JSON document with the following structure:

{
    "data":
        [
            <model-specific-data-structure>
        ]
}

The following example demonstrates how to load a registered scikit-learn model and score it with numpy data:

#Example: scikit-learn and Swagger
import json
import numpy as np
import os
from sklearn.externals import joblib
from sklearn.linear_model import Ridge

from inference_schema.schema_decorators import input_schema, output_schema
from inference_schema.parameter_types.numpy_parameter_type import NumpyParameterType


def init():
    global model
    # AZUREML_MODEL_DIR is an environment variable created during deployment. Join this path with the filename of the model file.
    # It holds the path to the directory that contains the deployed model (./azureml-models/$MODEL_NAME/$VERSION).
    # If there are multiple models, this value is the path to the directory containing all deployed models (./azureml-models).
    model_path = os.path.join(os.getenv('AZUREML_MODEL_DIR'), 'sklearn_mnist_model.pkl')

    # If your model were stored in the same directory as your score.py, you could also use the following:
    # model_path = os.path.abspath(os.path.join(os.path.dirname(__file_), 'sklearn_mnist_model.pkl')

    # Deserialize the model file back into a sklearn model
    model = joblib.load(model_path)


input_sample = np.array([[10, 9, 8, 7, 6, 5, 4, 3, 2, 1]])
output_sample = np.array([3726.995])


@input_schema('data', NumpyParameterType(input_sample))
@output_schema(NumpyParameterType(output_sample))
def run(data):
    try:
        result = model.predict(data)
        # You can return any data type, as long as it is JSON serializable.
        return result.tolist()
    except Exception as e:
        error = str(e)
        return error

For more examples, see the following scripts:

Define an inference configuration

The entries in the inferenceconfig.json document map to the parameters for the InferenceConfig class. The following table describes the mapping between entities in the JSON document and the parameters for the method:

JSON entity Method parameter Description
entryScript entry_script Path to a local file that contains the code to run for the image.
sourceDirectory source_directory Optional. Path to folders that contain all files to create the image, which makes it easy to access any files within this folder or subfolder. You can upload an entire folder from your local machine as dependencies for the Webservice. Note: your entry_script, conda_file, and extra_docker_file_steps paths are relative paths to the source_directory path.
environment environment Optional. Azure Machine Learning environment.

You can include full specifications of an Azure Machine Learning environment in the inference configuration file. If this environment doesn't exist in your workspace, Azure Machine Learning will create it. Otherwise, Azure Machine Learning will update the environment if necessary. The following JSON is an example:

{
    "entryScript": "score.py",
    "environment": {
        "docker": {
            "arguments": [],
            "baseDockerfile": null,
            "baseImage": "mcr.microsoft.com/azureml/base:intelmpi2018.3-ubuntu16.04",
            "enabled": false,
            "sharedVolumes": true,
            "shmSize": null
        },
        "environmentVariables": {
            "EXAMPLE_ENV_VAR": "EXAMPLE_VALUE"
        },
        "name": "my-deploy-env",
        "python": {
            "baseCondaEnvironment": null,
            "condaDependencies": {
                "channels": [
                    "conda-forge"
                ],
                "dependencies": [
                    "python=3.6.2",
                    {
                        "pip": [
                            "azureml-defaults",
                            "azureml-telemetry",
                            "scikit-learn",
                            "inference-schema[numpy-support]"
                        ]
                    }
                ],
                "name": "project_environment"
            },
            "condaDependenciesFile": null,
            "interpreterPath": "python",
            "userManagedDependencies": false
        },
        "version": "1"
    }
}

You can also use an existing Azure Machine Learning environment in separated CLI parameters and remove the "environment" key from the inference configuration file. Use -e for the environment name, and --ev for the environment version. If you don't specify --ev, the latest version will be used. Here is an example of an inference configuration file:

{
    "entryScript": "score.py",
    "sourceDirectory": null
}

The following command demonstrates how to deploy a model using the previous inference configuration file (named myInferenceConfig.json).

It also uses the latest version of an existing Azure Machine Learning environment (named AzureML-Minimal).

az ml model deploy -m mymodel:1 --ic myInferenceConfig.json -e AzureML-Minimal --dc deploymentconfig.json

The following command demonstrates how to deploy a model by using the CLI:

az ml model deploy -n myservice -m mymodel:1 --ic inferenceconfig.json

In this example, the configuration specifies the following settings:

  • That the model requires Python
  • The entry script, which is used to handle web requests sent to the deployed service
  • The Conda file that describes the Python packages needed for inference

For information on using a custom Docker image with an inference configuration, see How to deploy a model using a custom Docker image.

Choose a compute target

The compute target you use to host your model will affect the cost and availability of your deployed endpoint. Use the table below to choose an appropriate compute target.

Compute target Used for GPU support FPGA support Description
Local web service Testing/debugging     Use for limited testing and troubleshooting. Hardware acceleration depends on use of libraries in the local system.
Azure Machine Learning compute instance web service Testing/debugging     Use for limited testing and troubleshooting.
Azure Kubernetes Service (AKS) Real-time inference Yes (web service deployment) Yes Use for high-scale production deployments. Provides fast response time and autoscaling of the deployed service. Cluster autoscaling isn't supported through the Azure Machine Learning SDK. To change the nodes in the AKS cluster, use the UI for your AKS cluster in the Azure portal. AKS is the only option available for the designer.
Azure Container Instances Testing or development     Use for low-scale CPU-based workloads that require less than 48 GB of RAM.
Azure Machine Learning compute clusters Batch inference Yes (machine learning pipeline)   Run batch scoring on serverless compute. Supports normal and low-priority VMs.
Azure Functions (Preview) Real-time inference      
Azure IoT Edge (Preview) IoT module     Deploy and serve ML models on IoT devices.
Azure Data Box Edge Via IoT Edge   Yes Deploy and serve ML models on IoT devices.

Note

Although compute targets like local, Azure Machine Learning compute instance, and Azure Machine Learning compute clusters support GPU for training and experimentation, using GPU for inference when deployed as a web service is supported only on Azure Kubernetes Service.

Using a GPU for inference when scoring with a machine learning pipeline is supported only on Azure Machine Learning Compute.

Note

  • Azure Container Instances (ACI) are suitable only for small models less than 1 GB in size.
  • We recommend using single-node Azure Kubernetes Service (AKS) clusters for dev-test of larger models.

Define a deployment configuration

The options available for a deployment configuration differ depending on the compute target you choose.

The entries in the deploymentconfig.json document map to the parameters for LocalWebservice.deploy_configuration. The following table describes the mapping between the entities in the JSON document and the parameters for the method:

JSON entity Method parameter Description
computeType NA The compute target. For local targets, the value must be local.
port port The local port on which to expose the service's HTTP endpoint.

This JSON is an example deployment configuration for use with the CLI:

{
    "computeType": "local",
    "port": 32267
}

For more information, see the az ml model deploy documentation.

Deploy your model

You are now ready to deploy your model.

Using a registered model

If you registered your model in your Azure Machine Learning workspace, replace "mymodel:1" with the name of your model and its version number.

az ml model deploy -m mymodel:1 --ic inferenceconfig.json --dc deploymentconfig.json

Using a local model

If you would prefer not to register your model, you can pass the "sourceDirectory" parameter in your inferenceconfig.json to specify a local directory from which to serve your model.

az ml model deploy --ic inferenceconfig.json --dc deploymentconfig.json

Delete resources

To delete a deployed webservice, use az ml service <name of webservice>.

To delete a registered model from your workspace, use az ml model delete <model id>

Read more about deleting a webservice and deleting a model

Prerequisites

Connect to your workspace

from azureml.core import Workspace
ws = Workspace.from_config(path=".file-path/ws_config.json")

For more information on using the SDK to connect to a workspace, see the Azure Machine Learning SDK for Python documentation.

Register your model

A registered model is a logical container for one or more files that make up your model. For example, if you have a model that's stored in multiple files, you can register them as a single model in the workspace. After you register the files, you can then download or deploy the registered model and receive all the files that you registered.

Tip

When you register a model, you provide the path of either a cloud location (from a training run) or a local directory. This path is just to locate the files for upload as part of the registration process. It doesn't need to match the path used in the entry script. For more information, see Locate model files in your entry script.

Machine learning models are registered in your Azure Machine Learning workspace. The model can come from Azure Machine Learning or from somewhere else. When registering a model, you can optionally provide metadata about the model. The tags and properties dictionaries that you apply to a model registration can then be used to filter models.

The following examples demonstrate how to register a model.

Register a model from an Azure ML training run

When you use the SDK to train a model, you can receive either a Run object or an AutoMLRun object, depending on how you trained the model. Each object can be used to register a model created by an experiment run.

  • Register a model from an azureml.core.Run object:

    model = run.register_model(model_name='sklearn_mnist',
                               tags={'area': 'mnist'},
                               model_path='outputs/sklearn_mnist_model.pkl')
    print(model.name, model.id, model.version, sep='\t')
    

    The model_path parameter refers to the cloud location of the model. In this example, the path of a single file is used. To include multiple files in the model registration, set model_path to the path of a folder that contains the files. For more information, see the Run.register_model documentation.

  • Register a model from an azureml.train.automl.run.AutoMLRun object:

        description = 'My AutoML Model'
        model = run.register_model(description = description,
                                   tags={'area': 'mnist'})
    
        print(run.model_id)
    

    In this example, the metric and iteration parameters aren't specified, so the iteration with the best primary metric will be registered. The model_id value returned from the run is used instead of a model name.

    For more information, see the AutoMLRun.register_model documentation.

Register a model from a local file

You can register a model by providing the local path of the model. You can provide the path of either a folder or a single file. You can use this method to register models trained with Azure Machine Learning and then downloaded. You can also use this method to register models trained outside of Azure Machine Learning.

Important

You should use only models that you create or obtain from a trusted source. You should treat serialized models as code, because security vulnerabilities have been discovered in a number of popular formats. Also, models might be intentionally trained with malicious intent to provide biased or inaccurate output.

  • Using the SDK and ONNX

    import os
    import urllib.request
    from azureml.core.model import Model
    # Download model
    onnx_model_url = "https://www.cntk.ai/OnnxModels/mnist/opset_7/mnist.tar.gz"
    urllib.request.urlretrieve(onnx_model_url, filename="mnist.tar.gz")
    os.system('tar xvzf mnist.tar.gz')
    # Register model
    model = Model.register(workspace = ws,
                            model_path ="mnist/model.onnx",
                            model_name = "onnx_mnist",
                            tags = {"onnx": "demo"},
                            description = "MNIST image classification CNN from ONNX Model Zoo",)
    

    To include multiple files in the model registration, set model_path to the path of a folder that contains the files.

For more information, see the documentation for the Model class.

For more information on working with models trained outside Azure Machine Learning, see How to deploy an existing model.

Define an entry script

The entry script receives data submitted to a deployed web service and passes it to the model. It then takes the response returned by the model and returns that to the client. The script is specific to your model. It must understand the data that the model expects and returns.

The script contains two functions that load and run the model:

  • init(): Typically, this function loads the model into a global object. This function is run only once, when the Docker container for your web service is started.

  • run(input_data): This function uses the model to predict a value based on the input data. Inputs and outputs of the run typically use JSON for serialization and deserialization. You can also work with raw binary data. You can transform the data before sending it to the model or before returning it to the client.

The REST API expects the body of the request to be a JSON document with the following structure:

{
    "data":
        [
            <model-specific-data-structure>
        ]
}

The following example demonstrates how to load a registered scikit-learn model and score it with numpy data:

#Example: scikit-learn and Swagger
import json
import numpy as np
import os
from sklearn.externals import joblib
from sklearn.linear_model import Ridge

from inference_schema.schema_decorators import input_schema, output_schema
from inference_schema.parameter_types.numpy_parameter_type import NumpyParameterType


def init():
    global model
    # AZUREML_MODEL_DIR is an environment variable created during deployment. Join this path with the filename of the model file.
    # It holds the path to the directory that contains the deployed model (./azureml-models/$MODEL_NAME/$VERSION).
    # If there are multiple models, this value is the path to the directory containing all deployed models (./azureml-models).
    model_path = os.path.join(os.getenv('AZUREML_MODEL_DIR'), 'sklearn_mnist_model.pkl')

    # If your model were stored in the same directory as your score.py, you could also use the following:
    # model_path = os.path.abspath(os.path.join(os.path.dirname(__file_), 'sklearn_mnist_model.pkl')

    # Deserialize the model file back into a sklearn model
    model = joblib.load(model_path)


input_sample = np.array([[10, 9, 8, 7, 6, 5, 4, 3, 2, 1]])
output_sample = np.array([3726.995])


@input_schema('data', NumpyParameterType(input_sample))
@output_schema(NumpyParameterType(output_sample))
def run(data):
    try:
        result = model.predict(data)
        # You can return any data type, as long as it is JSON serializable.
        return result.tolist()
    except Exception as e:
        error = str(e)
        return error

For more examples, see the following scripts:

Define an inference configuration

An inference configuration describes how to set up the web-service containing your model. It's used later, when you deploy the model.

Inference configuration uses Azure Machine Learning environments to define the software dependencies needed for your deployment. Environments allow you to create, manage, and reuse the software dependencies required for training and deployment. You can create an environment from custom dependency files or use one of the curated Azure Machine Learning environments. The following YAML is an example of a Conda dependencies file for inference. Note that you must indicate azureml-defaults with verion >= 1.0.45 as a pip dependency, because it contains the functionality needed to host the model as a web service. If you want to use automatic schema generation, your entry script must also import the inference-schema packages.


name: project_environment
dependencies:
    - python=3.6.2
    - scikit-learn=0.20.0
    - pip:
        # You must list azureml-defaults as a pip dependency
    - azureml-defaults>=1.0.45
    - inference-schema[numpy-support]

Important

If your dependency is available through both Conda and pip (from PyPi), Microsoft recommends using the Conda version, as Conda packages typically come with pre-built binaries that make installation more reliable.

For more information, see Understanding Conda and Pip.

To check if your dependency is available through Conda, use the conda search <package-name> command, or use the package indexes at https://anaconda.org/anaconda/repo and https://anaconda.org/conda-forge/repo.

You can use the dependencies file to create an environment object and save it to your workspace for future use:

from azureml.core.environment import Environment
myenv = Environment.from_conda_specification(name = 'myenv',
                                                file_path = 'path-to-conda-specification-file'
myenv.register(workspace=ws)

For a thorough discussion of using and customizing Python environments with Azure Machine Learning, see Create & use software environments in Azure Machine Learning

For information on using a custom Docker image with an inference configuration, see How to deploy a model using a custom Docker image.

The following example demonstrates loading an environment from your workspace and then using it with the inference configuration:

from azureml.core.environment import Environment
from azureml.core.model import InferenceConfig


myenv = Environment.get(workspace=ws, name='myenv', version='1')
inference_config = InferenceConfig(entry_script='path-to-score.py',
                                    environment=myenv)

For more information on environments, see Create and manage environments for training and deployment.

For more information on inference configuration, see the InferenceConfig class documentation.

Choose a compute target

The compute target you use to host your model will affect the cost and availability of your deployed endpoint. Use the table below to choose an appropriate compute target.

Compute target Used for GPU support FPGA support Description
Local web service Testing/debugging     Use for limited testing and troubleshooting. Hardware acceleration depends on use of libraries in the local system.
Azure Machine Learning compute instance web service Testing/debugging     Use for limited testing and troubleshooting.
Azure Kubernetes Service (AKS) Real-time inference Yes (web service deployment) Yes Use for high-scale production deployments. Provides fast response time and autoscaling of the deployed service. Cluster autoscaling isn't supported through the Azure Machine Learning SDK. To change the nodes in the AKS cluster, use the UI for your AKS cluster in the Azure portal. AKS is the only option available for the designer.
Azure Container Instances Testing or development     Use for low-scale CPU-based workloads that require less than 48 GB of RAM.
Azure Machine Learning compute clusters Batch inference Yes (machine learning pipeline)   Run batch scoring on serverless compute. Supports normal and low-priority VMs.
Azure Functions (Preview) Real-time inference      
Azure IoT Edge (Preview) IoT module     Deploy and serve ML models on IoT devices.
Azure Data Box Edge Via IoT Edge   Yes Deploy and serve ML models on IoT devices.

Note

Although compute targets like local, Azure Machine Learning compute instance, and Azure Machine Learning compute clusters support GPU for training and experimentation, using GPU for inference when deployed as a web service is supported only on Azure Kubernetes Service.

Using a GPU for inference when scoring with a machine learning pipeline is supported only on Azure Machine Learning Compute.

Note

  • Azure Container Instances (ACI) are suitable only for small models less than 1 GB in size.
  • We recommend using single-node Azure Kubernetes Service (AKS) clusters for dev-test of larger models.

Define a deployment configuration

Before deploying your model, you must define the deployment configuration. The deployment configuration is specific to the compute target that will host the web service. For example, when you deploy a model locally, you must specify the port where the service accepts requests. The deployment configuration isn't part of your entry script. It's used to define the characteristics of the compute target that will host the model and entry script.

You might also need to create the compute resource, if, for example, you don't already have an Azure Kubernetes Service (AKS) instance associated with your workspace.

The following table provides an example of creating a deployment configuration for each compute target:

Compute target Deployment configuration example
Local deployment_config = LocalWebservice.deploy_configuration(port=8890)
Azure Container Instances deployment_config = AciWebservice.deploy_configuration(cpu_cores = 1, memory_gb = 1)
Azure Kubernetes Service deployment_config = AksWebservice.deploy_configuration(cpu_cores = 1, memory_gb = 1)

The classes for local, Azure Container Instances, and AKS web services can be imported from azureml.core.webservice:

from azureml.core.webservice import AciWebservice, AksWebservice, LocalWebservice

Deploy your model

You are now ready to deploy your model. The example below demonstrates a local deployment. The syntax will vary depending on the compute target you chose in the previous step.

from azureml.core.webservice import LocalWebservice, Webservice

deployment_config = LocalWebservice.deploy_configuration(port=8890)
service = Model.deploy(ws, "myservice", [model], inference_config, deployment_config)
service.wait_for_deployment(show_output = True)
print(service.state)

For more information, see the documentation for LocalWebservice, Model.deploy(), and Webservice.

Delete resources

To delete a deployed web service, use service.delete(). To delete a registered model, use model.delete().

For more information, see the documentation for WebService.delete() and Model.delete().

Understanding service state

During model deployment, you may see the service state change while it fully deploys.

The following table describes the different service states:

Webservice state Description Final state?
Transitioning The service is in the process of deployment. No
Unhealthy The service has deployed but is currently unreachable. No
Unschedulable The service cannot be deployed at this time due to lack of resources. No
Failed The service has failed to deploy due to an error or crash. Yes
Healthy The service is healthy and the endpoint is available. Yes

Batch inference

Azure Machine Learning Compute targets are created and managed by Azure Machine Learning. They can be used for batch prediction from Azure Machine Learning pipelines.

For a walkthrough of batch inference with Azure Machine Learning Compute, see How to run batch predictions.

IoT Edge inference

Support for deploying to the edge is in preview. For more information, see Deploy Azure Machine Learning as an IoT Edge module.

Next steps