Deploy machine learning models to Azure

Learn how to deploy your machine learning or deep learning model as a web service in the Azure cloud. You can also deploy to Azure IoT Edge devices.

The workflow is similar no matter where you deploy your model:

  1. Register the model (optional, see below).
  2. Prepare an inference configuration (unless using no-code deployment).
  3. Prepare an entry script (unless using no-code deployment).
  4. Choose a compute target.
  5. Deploy the model to the compute target.
  6. Test the resulting web service.

For more information on the concepts involved in the machine learning deployment workflow, see Manage, deploy, and monitor models with Azure Machine Learning.


Connect to your workspace

Follow the directions in the Azure CLI documentation for setting your subscription context.

Then do:

az ml workspace list --resource-group=<my resource group>

to see the workspaces you have access to.

Register your model (optional)

A registered model is a logical container for one or more files that make up your model. For example, if you have a model that's stored in multiple files, you can register them as a single model in the workspace. After you register the files, you can then download or deploy the registered model and receive all the files that you registered.


Registering a model for version tracking is recommended but not required. If you would rather proceed without registering a model, you will need to specify a source directory in your InferenceConfig or inferenceconfig.json and ensure your model resides within that source directory.


When you register a model, you provide the path of either a cloud location (from a training run) or a local directory. This path is just to locate the files for upload as part of the registration process. It doesn't need to match the path used in the entry script. For more information, see Locate model files in your entry script.


When using Filter by Tags option on the Models page of Azure Machine Learning Studio, instead of using TagName : TagValue customers should use TagName=TagValue (without space)

The following examples demonstrate how to register a model.

Register a model from an Azure ML training run

az ml model register -n sklearn_mnist  --asset-path outputs/sklearn_mnist_model.pkl  --experiment-name myexperiment --run-id myrunid --tag area=mnist


If you get an error message stating that the ml extension isn't installed, use the following command to install it:

az extension add -n azure-cli-ml

The --asset-path parameter refers to the cloud location of the model. In this example, the path of a single file is used. To include multiple files in the model registration, set --asset-path to the path of a folder that contains the files.

Register a model from a local file

az ml model register -n onnx_mnist -p mnist/model.onnx

To include multiple files in the model registration, set -p to the path of a folder that contains the files.

For more information on az ml model register, consult the reference documentation.

Define an entry script

The entry script receives data submitted to a deployed web service and passes it to the model. It then takes the response returned by the model and returns that to the client. The script is specific to your model. It must understand the data that the model expects and returns.

The two things you need to accomplish in your entry script are:

  1. Loading your model (using a function called init())
  2. Running your model on input data (using a function called run())

Let's go through these steps in detail.

Writing init()

Loading a registered model

Your registered models are stored at a path pointed to by an environment variable called AZUREML_MODEL_DIR. For more information on the exact directory structure, see Locate model files in your entry script

Loading a local model

If you opted against registering your model and passed your model as part of your source directory, you can read it in like you would locally, by passing the path to the model relative to your scoring script. For example, if you had a directory structured as:

- source_dir
    - models
        - model1.onnx

you could load your models with the following Python code:

import os

model = open(os.path.join('.', 'models', 'model1.onnx'))

Writing run()

run() is executed every time your model receives a scoring request, and expects the body of the request to be a JSON document with the following structure:

    "data": <model-specific-data-structure>

The input to run() is a Python string containing whatever follows the "data" key.

The following example demonstrates how to load a registered scikit-learn model and score it with numpy data:

import json
import numpy as np
import os
from sklearn.externals import joblib

def init():
    global model
    model_path = os.path.join(os.getenv('AZUREML_MODEL_DIR'), 'sklearn_mnist_model.pkl')
    model = joblib.load(model_path)

def run(data):
        data = np.array(json.loads(data))
        result = model.predict(data)
        # You can return any data type, as long as it is JSON serializable.
        return result.tolist()
    except Exception as e:
        error = str(e)
        return error

For more advanced examples, including automatic Swagger schema generation and binary (i.e. image) data, read the article on advanced entry script authoring

Define an inference configuration

An inference configuration describes how to set up the web-service containing your model. It's used later, when you deploy the model.

A minimal inference configuration can be written as:

    "entryScript": "",
    "sourceDirectory": "./working_dir"

This specifies that the machine learning deployment will use the file in the ./working_dir directory to process incoming requests.

See this article for a more thorough discussion of inference configurations.


For information on using a custom Docker image with an inference configuration, see How to deploy a model using a custom Docker image.

Choose a compute target

The compute target you use to host your model will affect the cost and availability of your deployed endpoint. Use this table to choose an appropriate compute target.

Compute target Used for GPU support FPGA support Description
Local web service Testing/debugging     Use for limited testing and troubleshooting. Hardware acceleration depends on use of libraries in the local system.
Azure Kubernetes Service (AKS) Real-time inference Yes (web service deployment) Yes Use for high-scale production deployments. Provides fast response time and autoscaling of the deployed service. Cluster autoscaling isn't supported through the Azure Machine Learning SDK. To change the nodes in the AKS cluster, use the UI for your AKS cluster in the Azure portal.

Supported in the designer.
Azure Container Instances Testing or development     Use for low-scale CPU-based workloads that require less than 48 GB of RAM.

Supported in the designer.
Azure Machine Learning compute clusters Batch inference Yes (machine learning pipeline)   Run batch scoring on serverless compute. Supports normal and low-priority VMs. No support for realtime inference.


Although compute targets like local, Azure Machine Learning compute, and Azure Machine Learning compute clusters support GPU for training and experimentation, using GPU for inference when deployed as a web service is supported only on AKS.

Using a GPU for inference when scoring with a machine learning pipeline is supported only on Azure Machine Learning compute.

When choosing a cluster SKU, first scale up and then scale out. Start with a machine that has 150% of the RAM your model requires, profile the result and find a machine that has the performance you need. Once you've learned that, increase the number of machines to fit your need for concurrent inference.


  • Container instances are suitable only for small models less than 1 GB in size.
  • Use single-node AKS clusters for dev/test of larger models.

Define a deployment configuration

The options available for a deployment configuration differ depending on the compute target you choose.

The entries in the deploymentconfig.json document map to the parameters for LocalWebservice.deploy_configuration. The following table describes the mapping between the entities in the JSON document and the parameters for the method:

JSON entity Method parameter Description
computeType NA The compute target. For local targets, the value must be local.
port port The local port on which to expose the service's HTTP endpoint.

This JSON is an example deployment configuration for use with the CLI:

    "computeType": "local",
    "port": 32267

For more information, see this reference.

Deploy your machine learning model

You are now ready to deploy your model.

Using a registered model

If you registered your model in your Azure Machine Learning workspace, replace "mymodel:1" with the name of your model and its version number.

az ml model deploy -m mymodel:1 --ic inferenceconfig.json --dc deploymentconfig.json

Using a local model

If you would prefer not to register your model, you can pass the "sourceDirectory" parameter in your inferenceconfig.json to specify a local directory from which to serve your model.

az ml model deploy --ic inferenceconfig.json --dc deploymentconfig.json

Understanding service state

During model deployment, you may see the service state change while it fully deploys.

The following table describes the different service states:

Webservice state Description Final state?
Transitioning The service is in the process of deployment. No
Unhealthy The service has deployed but is currently unreachable. No
Unschedulable The service cannot be deployed at this time due to lack of resources. No
Failed The service has failed to deploy due to an error or crash. Yes
Healthy The service is healthy and the endpoint is available. Yes


When deploying, Docker images for compute targets are built and loaded from Azure Container Registry (ACR). By default, Azure Machine Learning creates an ACR that uses the basic service tier. Changing the ACR for your workspace to standard or premium tier may reduce the time it takes to build and deploy images to your compute targets. For more information, see Azure Container Registry service tiers.


If you are deploying a model to Azure Kubernetes Service (AKS), we advise you enable Azure Monitor for that cluster. This will help you understand overall cluster health and resource usage. You might also find the following resources useful:

If you are trying to deploy a model to an unhealthy or overloaded cluster, it is expected to experience issues. If you need help troubleshooting AKS cluster problems please contact AKS Support.

Batch inference

Azure Machine Learning Compute targets are created and managed by Azure Machine Learning. They can be used for batch prediction from Azure Machine Learning pipelines.

For a walkthrough of batch inference with Azure Machine Learning Compute, see How to run batch predictions.

IoT Edge inference

Support for deploying to the edge is in preview. For more information, see Deploy Azure Machine Learning as an IoT Edge module.

Delete resources

To delete a deployed webservice, use az ml service <name of webservice>.

To delete a registered model from your workspace, use az ml model delete <model id>

Read more about deleting a webservice and deleting a model.

Next steps