Deploy models with the Azure Machine Learning service

Learn how to deploy your machine learning model as a web service in the Azure cloud, or to IoT Edge devices.

The workflow is similar regardless of where you deploy your model:

  1. Register the model.
  2. Prepare to deploy (specify assets, usage, compute target)
  3. Deploy the model to the compute target.
  4. Test the deployed model, also called web service.

For more information on the concepts involved in the deployment workflow, see Manage, deploy, and monitor models with Azure Machine Learning Service.


Register your model

Register your machine learning models in your Azure Machine Learning workspace. The model can come from Azure Machine Learning or can come from somewhere else. The following examples demonstrate how to register a model from file:

Register a model from an Experiment Run

  • Scikit-Learn example using the SDK

    model = run.register_model(model_name='sklearn_mnist', model_path='outputs/sklearn_mnist_model.pkl')
    print(,, model.version, sep='\t')
  • Using the CLI

    az ml model register -n sklearn_mnist  --asset-path outputs/sklearn_mnist_model.pkl  --experiment-name myexperiment
  • Using VS Code

    Register models using any model files or folders with the VS Code extension.

Register an externally created model


You should only use models that you create or obtain from a trusted source. Serialized models should be treated as code, as security vulnerabilities have been discovered in a number of popular formats. Further, models may be intentionally trained with malicious intent to provide biased or inaccurate output.

You can register an externally created model by providing a local path to the model. You can provide either a folder or a single file.

  • ONNX example with the Python SDK:

    onnx_model_url = ""
    urllib.request.urlretrieve(onnx_model_url, filename="mnist.tar.gz")
    !tar xvzf mnist.tar.gz
    model = Model.register(workspace = ws,
                           model_path ="mnist/model.onnx",
                           model_name = "onnx_mnist",
                           tags = {"onnx": "demo"},
                           description = "MNIST image classification CNN from ONNX Model Zoo",)
  • Using the CLI

    az ml model register -n onnx_mnist -p mnist/model.onnx

Time estimate: Approximately 10 seconds.

For more information, see the reference documentation for the Model class.

Choose a compute target

The following compute targets, or compute resources, can be used to host your web service deployment.

Compute target Usage Description
Local web service Testing/debug Good for limited testing and troubleshooting.
Azure Kubernetes Service (AKS) Real-time inference Good for high-scale production deployments. Provides autoscaling, and fast response times.
Azure Container Instances (ACI) Testing Good for low scale, CPU-based workloads.
Azure Machine Learning Compute (Preview) Batch inference Run batch scoring on serverless compute. Supports normal and low-priority VMs.
Azure IoT Edge (Preview) IoT module Deploy & serve ML models on IoT devices.

Prepare to deploy

To deploy as a web service, you must create an inference configuration (InferenceConfig) and a deployment configuration. Inference, or model scoring, is the phase where the deployed model is used for prediction, most commonly on production data. In the inference config, you specify the scripts and dependencies needed to serve your model. In the deployment config you specify details of how to serve the model on the compute target.

1. Define your entry script & dependencies

The entry script receives data submitted to a deployed web service, and passes it to the model. It then takes the response returned by the model and returns that to the client. The script is specific to your model; it must understand the data that the model expects and returns.

The script contains two functions that load and run the model:

  • init(): Typically this function loads the model into a global object. This function is run only once when the Docker container for your web service is started.

  • run(input_data): This function uses the model to predict a value based on the input data. Inputs and outputs to the run typically use JSON for serialization and de-serialization. You can also work with raw binary data. You can transform the data before sending to the model, or before returning to the client.

(Optional) Automatic Swagger schema generation

To automatically generate a schema for your web service, provide a sample of the input and/or output in the constructor for one of the defined type objects, and the type and sample are used to automatically create the schema. Azure Machine Learning service then creates an OpenAPI (Swagger) specification for the web service during deployment.

The following types are currently supported:

  • pandas
  • numpy
  • pyspark
  • standard Python object

To use schema generation, include the inference-schema package in your conda environment file. The following example uses [numpy-support] since the entry script uses a numpy parameter type:

Example dependencies file

The following is an example of a Conda dependencies file for inference.

name: project_environment
  - python=3.6.2
  - pip:
    - azureml-defaults
    - scikit-learn==0.20.0
    - inference-schema[numpy-support]

If you want to use automatic schema generation, your entry script must import the inference-schema packages.

Define the input and output sample formats in the input_sample and output_sample variables, which represent the request and response formats for the web service. Use these samples in the input and output function decorators on the run() function. The scikit-learn example below uses schema generation.


After deploying the service, use the swagger_uri property to retrieve the schema JSON document.

Example entry script

The following example demonstrates how to accept and return JSON data:

#example: scikit-learn and Swagger
import json
import numpy as np
from sklearn.externals import joblib
from sklearn.linear_model import Ridge
from azureml.core.model import Model

from inference_schema.schema_decorators import input_schema, output_schema
from inference_schema.parameter_types.numpy_parameter_type import NumpyParameterType

def init():
    global model
    # note here "sklearn_regression_model.pkl" is the name of the model registered under
    # this is a different behavior than before when the code is run locally, even though the code is the same.
    model_path = Model.get_model_path('sklearn_regression_model.pkl')
    # deserialize the model file back into a sklearn model
    model = joblib.load(model_path)

input_sample = np.array([[10,9,8,7,6,5,4,3,2,1]])
output_sample = np.array([3726.995])

@input_schema('data', NumpyParameterType(input_sample))
def run(data):
        result = model.predict(data)
        # you can return any datatype as long as it is JSON-serializable
        return result.tolist()
    except Exception as e:
        error = str(e)
        return error

For more example scripts, see the following examples:

2. Define your InferenceConfig

The inference configuration describes how to configure the model to make predictions. The following example demonstrates how to create an inference configuration:

inference_config = InferenceConfig(source_directory="C:/abc",
                                   runtime= "python",

In this example, the configuration contains the following items:

  • A directory that contains assets needed to inference
  • That this model requires Python
  • The entry script, which is used to handle web requests sent to the deployed service
  • The conda file that describes the Python packages needed to inference

For information on InferenceConfig functionality, see the Advanced configuration section.

3. Define your Deployment configuration

Before deploying, you must define the deployment configuration. The deployment configuration is specific to the compute target that will host the web service. For example, when deploying locally you must specify the port where the service accepts requests.

You may also need to create the compute resource. For example, if you do not already have an Azure Kubernetes Service associated with your workspace.

The following table provides an example of creating a deployment configuration for each compute target:

Compute target Deployment configuration example
Local deployment_config = LocalWebservice.deploy_configuration(port=8890)
Azure Container Instance deployment_config = AciWebservice.deploy_configuration(cpu_cores = 1, memory_gb = 1)
Azure Kubernetes Service deployment_config = AksWebservice.deploy_configuration(cpu_cores = 1, memory_gb = 1)

The following sections demonstrate how to create the deployment configuration, and then use it to deploy the web service.

Deploy to target

Local deployment

To deploy locally, you need to have Docker installed on your local machine.

The examples in this section use deploy_from_image, which requires you to register the model and image before doing a deployment. For more information on other deployment methods, see deploy and deploy_from_model.

  • Using the SDK

    deployment_config = LocalWebservice.deploy_configuration(port=8890)
    service = Model.deploy(ws, "myservice", [model], inference_config, deployment_config)
    service.wait_for_deployment(show_output = True)
  • Using the CLI

    az ml model deploy -m sklearn_mnist:1 -ic inferenceconfig.json -dc deploymentconfig.json

Azure Container Instances (DEVTEST)

Use Azure Container Instances for deploying your models as a web service if one or more of the following conditions is true:

  • You need to quickly deploy and validate your model.
  • You are testing a model that is under development.

To see quota and region availability for ACI, see the Quotas and region availability for Azure Container Instances article.

  • Using the SDK

    deployment_config = AciWebservice.deploy_configuration(cpu_cores = 1, memory_gb = 1)
    service = Model.deploy(ws, "aciservice", [model], inference_config, deployment_config)
    service.wait_for_deployment(show_output = True)
  • Using the CLI

    az ml model deploy -m sklearn_mnist:1 -n aciservice -ic inferenceconfig.json -dc deploymentconfig.json
  • Using VS Code

    To deploy your models with VS Code you don't need to create an ACI container to test in advance, because ACI containers are created on the fly.

For more information, see the reference documentation for the AciWebservice and Webservice classes.

Azure Kubernetes Service (PRODUCTION)

You can use an existing AKS cluster or create a new one using the Azure Machine Learning SDK, CLI, or the Azure portal.

If you already have an AKS cluster attached, you can deploy to it. If you haven't created or attached an AKS cluster, follow the process to create a new AKS cluster.

  • Using the SDK

    aks_target = AksCompute(ws,"myaks")
    deployment_config = AksWebservice.deploy_configuration(cpu_cores = 1, memory_gb = 1)
    service = Model.deploy(ws, "aksservice", [model], inference_config, deployment_config, aks_target)
    service.wait_for_deployment(show_output = True)
  • Using the CLI

    az ml model deploy -ct myaks -m mymodel:1 -n aksservice -ic inferenceconfig.json -dc deploymentconfig.json
  • Using VS Code

    You can also deploy to AKS via the VS Code extension, but you'll need to configure AKS clusters in advance.

Learn more about AKS deployment and autoscale in the AksWebservice.deploy_configuration reference.

Create a new AKS cluster

Time estimate: Approximately 5 minutes.


Creating or attaching an AKS cluster is a one time process for your workspace. You can reuse this cluster for multiple deployments. If you delete the cluster or the resource group that contains it, you must create a new cluster the next time you need to deploy.

For more information on setting autoscale_target_utilization, autoscale_max_replicas, and autoscale_min_replicas, see the AksWebservice.deploy_configuration reference. The following example demonstrates how to create a new Azure Kubernetes Service cluster:

from azureml.core.compute import AksCompute, ComputeTarget

# Use the default configuration (you can also provide parameters to customize this)
prov_config = AksCompute.provisioning_configuration()

aks_name = 'myaks'
# Create the cluster
aks_target = ComputeTarget.create(workspace = ws,
                                    name = aks_name,
                                    provisioning_configuration = prov_config)

# Wait for the create process to complete
aks_target.wait_for_completion(show_output = True)

For more information on creating an AKS cluster outside of the Azure Machine Learning SDK, see the following articles:


For provisioning_configuration(), if you pick custom values for agent_count and vm_size, then you need to make sure agent_count multiplied by vm_size is greater than or equal to 12 virtual CPUs. For example, if you use a vm_size of "Standard_D3_v2", which has 4 virtual CPUs, then you should pick an agent_count of 3 or greater.

Time estimate: Approximately 20 minutes.

Attach an existing AKS cluster

If you already have AKS cluster in your Azure subscription, and it is version 1.12.## and has at least 12 virtual CPUs, you can use it to deploy your image. The following code demonstrates how to attach an existing AKS 1.12.## cluster to your workspace:

from azureml.core.compute import AksCompute, ComputeTarget
# Set the resource group that contains the AKS cluster and the cluster name
resource_group = 'myresourcegroup'
cluster_name = 'mycluster'

# Attach the cluster to your workgroup
attach_config = AksCompute.attach_configuration(resource_group = resource_group,
                                         cluster_name = cluster_name)
aks_target = ComputeTarget.attach(ws, 'mycompute', attach_config)

Consume web services

Every deployed web service provides a REST API, so you can create client applications in a variety of programming languages. If you have enabled authentication for your service, you need to provide a service key as a token in your request header.

Request-response consumption

Here is an example of how to invoke your service in Python:

import requests
import json

headers = {'Content-Type':'application/json'}

if service.auth_enabled:
    headers['Authorization'] = 'Bearer '+service.get_keys()[0]

test_sample = json.dumps({'data': [

response =, data=test_sample, headers=headers)

For more information, see Create client applications to consume webservices.

Batch consumption

Azure Machine Learning Compute targets are created and managed by the Azure Machine Learning service. They can be used for batch prediction from Azure Machine Learning Pipelines.

For a walkthrough of batch inference with Azure Machine Learning Compute, read the How to Run Batch Predictions article.

IoT Edge inference

Support for deploying to the edge is in preview. For more information, see the Deploy Azure Machine Learning as an IoT Edge module article.

Update web services

When you create a new model, you must manually update each service that you want to use the new model. To update the web service, use the update method. The following code demonstrates how to update the web service to use a new model:

from azureml.core.webservice import Webservice
from azureml.core.model import Model

# register new model
new_model = Model.register(model_path = "outputs/sklearn_mnist_model.pkl",
                       model_name = "sklearn_mnist",
                       tags = {"key": "0.1"},
                       description = "test",
                       workspace = ws)

service_name = 'myservice'
# Retrieve existing service
service = Webservice(name = service_name, workspace = ws)

# Update to new model(s)
service.update(models = [new_model])

Advanced settings

Use a custom base image

Internally, InferenceConfig creates a Docker image that contains the model and other assets needed by the service. If not specified, a default base image is used.

When creating an image to use with your inference configuration, the image must meet the following requirements:

  • Ubuntu 16.04 or greater.
  • Conda 4.5.# or greater.
  • Python 3.5.# or 3.6.#.

To use a custom image, set the base_image property of the inference configuration to the address of the image. The following example demonstrates how to use an image from both a public and private Azure Container Registry:

# use an image available in public Container Registry without authentication
inference_config.base_image = ""

# or, use an image available in a private Container Registry
inference_config.base_image = ""
inference_config.base_image_registry.address = ""
inference_config.base_image_registry.username = "username"
inference_config.base_image_registry.password = "password"

The following image URIs are for images provided by Microsoft, and can be used without providing a user name or password value:


To use these images, set the base_image to the URI from the list above. Set base_image_registry.address to


Microsoft images that use CUDA or TensorRT must be used on Microsoft Azure Services only.

For more information on uploading your own images to an Azure Container Registry, see Push your first image to a private Docker container registry.

If your model is trained on Azure Machine Learning Compute, using version 1.0.22 or greater of the Azure Machine Learning SDK, an image is created during training. The following example demonstrates how to use this image:

# Use an image built during training with SDK 1.0.22 or greater
image_config.base_image =["AzureML.DerivedImageName"]

Clean up resources

To delete a deployed web service, use service.delete(). To delete a registered model, use model.delete().

For more information, see the reference documentation for WebService.delete(), and Model.delete().

Next steps