Deploy models with the Azure Machine Learning service

The Azure Machine Learning service provides several ways you can deploy your trained model using the SDK. In this document, learn how to deploy your model as a web service in the Azure cloud, or to IoT Edge devices.

Important

Cross-origin resource sharing (CORS) is not currently supported when deploying a model as a web service.

You can deploy models to the following compute targets:

Compute target Deployment type Description
Azure Kubernetes Service (AKS) Real-time inference Good for high-scale production deployments. Provides autoscaling, and fast response times.
Azure ML Compute Batch inference Run batch prediction on serverless compute. Supports normal and low priority VMs.
Azure Container Instances (ACI) Testing Good for development or testing. Not suitable for production workloads.
Azure IoT Edge (Preview) IoT module Deploy models on IoT devices. Inferencing happens on the device.
Field-programmable gate array (FPGA) (Preview) Web service Ultra-low latency for real-time inferencing.

The process of deploying a model is similar for all compute targets:

  1. Train and register a model.
  2. Configure and register an image that uses the model.
  3. Deploy the image to a compute target.
  4. Test the deployment

For more information on the concepts involved in the deployment workflow, see Manage, deploy, and monitor models with Azure Machine Learning Service.

Prerequisites

  • An Azure subscription. If you don’t have an Azure subscription, create a free account before you begin. Try the free or paid version of Azure Machine Learning service today.

  • An Azure Machine Learning service workspace and the Azure Machine Learning SDK for Python installed. Learn how to get these prerequisites using the Get started with Azure Machine Learning quickstart.

  • A trained model. If you do not have a trained model, use the steps in the Train models tutorial to train and register one with the Azure Machine Learning service.

    Note

    While the Azure Machine Learning service can work with any generic model that can be loaded in Python 3, the examples in this document demonstrate using a model stored in pickle format.

    For more information on using ONNX models, see the ONNX and Azure Machine Learning document.

Register a trained model

The model registry is a way to store and organize your trained models in the Azure cloud. Models are registered in your Azure Machine Learning service workspace. The model can be trained using Azure Machine Learning, or another service. To register a model from file, use the following code:

from azureml.core.model import Model

model = Model.register(model_path = "model.pkl",
                       model_name = "Mymodel",
                       tags = {"key": "0.1"},
                       description = "test",
                       workspace = ws)

Time estimate: Approximately 10 seconds.

For more information, see the reference documentation for the Model class.

Create and register an image

Deployed models are packaged as an image. The image contains the dependencies needed to run the model.

For Azure Container Instance, Azure Kubernetes Service, and Azure IoT Edge deployments, the azureml.core.image.ContainerImage class is used to create an image configuration. The image configuration is then used to create a new Docker image.

The following code demonstrates how to create a new image configuration:

from azureml.core.image import ContainerImage

# Image configuration
image_config = ContainerImage.image_configuration(execution_script = "score.py",
                                                 runtime = "python",
                                                 conda_file = "myenv.yml",
                                                 description = "Image with ridge regression model",
                                                 tags = {"data": "diabetes", "type": "regression"}
                                                 )

Time estimate: Approximately 10 seconds.

The important parameters in this example described in the following table:

Parameter Description
execution_script Specifies a Python script that is used to receive requests submitted to the service. In this example, the script is contained in the score.py file. For more information, see the Execution script section.
runtime Indicates that the image uses Python. The other option is spark-py, which uses Python with Apache Spark.
conda_file Used to provide a conda environment file. This file defines the conda environment for the deployed model. For more information on creating this file, see Create an environment file (myenv.yml).

For more information, see the reference documentation for ContainerImage class

Execution script

The execution script receives data submitted to a deployed image, and passes it to the model. It then takes the response returned by the model and returns that to the client. The script is specific to your model; it must understand the data that the model expects and returns. The script usually contains two functions that load and run the model:

  • init(): Typically this function loads the model into a global object. This function is run only once when the Docker container is started.

  • run(input_data): This function uses the model to predict a value based on the input data. Inputs and outputs to the run typically use JSON for serialization and de-serialization. You can also work with raw binary data. You can transform the data before sending to the model, or before returning to the client.

Working with JSON data

The following example script accepts and returns JSON data. The run function transforms the data from JSON into a format that the model expects, and then transforms the response to JSON before returning it:

# import things required by this script
import json
import numpy as np
import os
import pickle
from sklearn.externals import joblib
from sklearn.linear_model import LogisticRegression

from azureml.core.model import Model

# load the model
def init():
    global model
    # retrieve the path to the model file using the model name
    model_path = Model.get_model_path('sklearn_mnist')
    model = joblib.load(model_path)

# Passes data to the model and returns the prediction
def run(raw_data):
    data = np.array(json.loads(raw_data)['data'])
    # make prediction
    y_hat = model.predict(data)
    return json.dumps(y_hat.tolist())

Working with Binary data

If your model accepts binary data, use AMLRequest, AMLResponse, and rawhttp. The following example script accepts binary data and returns the reversed bytes for POST requests. For GET requests, it returns the full URL in the response body:

from azureml.contrib.services.aml_request  import AMLRequest, rawhttp
from azureml.contrib.services.aml_response import AMLResponse

def init():
    print("This is init()")

# Accept and return binary data
@rawhttp
def run(request):
    print("This is run()")
    print("Request: [{0}]".format(request))
    # handle GET requests
    if request.method == 'GET':
        respBody = str.encode(request.full_path)
        return AMLResponse(respBody, 200)
    # handle POST requests
    elif request.method == 'POST':
        reqBody = request.get_data(False)
        respBody = bytearray(reqBody)
        respBody.reverse()
        respBody = bytes(respBody)
        return AMLResponse(respBody, 200)
    else:
        return AMLResponse("bad request", 500)

Important

The azureml.contrib namespace changes frequently, as we work to improve the service. As such, anything in this namespace should be considered as a preview, and not fully supported by Microsoft.

If you need to test this on your local development environment, you can install the components in the contrib namespace by using the following command:

pip install azureml-contrib-services

Register the image

Once you have created the image configuration, you can use it to register an image. This image is stored in the container registry for your workspace. Once created, you can deploy the same image to multiple services.

# Register the image from the image configuration
image = ContainerImage.create(name = "myimage", 
                              models = [model], #this is the model object
                              image_config = image_config,
                              workspace = ws
                              )

Time estimate: Approximately 3 minutes.

Images are versioned automatically when you register multiple images with the same name. For example, the first image registered as myimage is assigned an ID of myimage:1. The next time you register an image as myimage, the ID of the new image is myimage:2.

For more information, see the reference documentation for ContainerImage class.

Deploy the image

When you get to deployment, the process is slightly different depending on the compute target that you deploy to. Use the information in the following sections to learn how to deploy to:

Note

When deploying as a web service, there are three deployment methods you can use:

Method Notes
deploy_from_image You must register the model and create an image before using this method.
deploy When using this method, you do not need to register the model or create the image. However you cannot control the name of the model or image, or associated tags and descriptions.
deploy_from_model When using this method, you do not need to create an image. But you do not have control over the name of the image that is created.

The examples in this document use deploy_from_image.

Deploy to Azure Container Instances (DEVTEST)

Use Azure Container Instances for deploying your models as a web service if one or more of the following conditions is true:

To deploy to Azure Container Instances, use the following steps:

  1. Define the deployment configuration. The following example defines a configuration that uses one CPU core and 1 GB of memory:

    from azureml.core.webservice import AciWebservice
    
    aciconfig = AciWebservice.deploy_configuration(cpu_cores = 1, 
                                                   memory_gb = 1, 
                                                   tags = {"data": "mnist", "type": "classification"}, 
                                                   description = 'Handwriting recognition')
    
  2. To deploy the image created in the Create the image section of this document, use the following code:

    from azureml.core.webservice import Webservice
    
    service_name = 'aci-mnist-13'
    service = Webservice.deploy_from_image(deployment_config = aciconfig,
                                                image = image,
                                                name = service_name,
                                                workspace = ws)
    service.wait_for_deployment(show_output = True)
    print(service.state)
    

    Time estimate: Approximately 3 minutes.

For more information, see the reference documentation for the AciWebservice and Webservice classes.

Deploy to Azure Kubernetes Service (PRODUCTION)

To deploy your model as a high-scale production web service, use Azure Kubernetes Service (AKS). You can use an existing AKS cluster or create a new one using the Azure Machine Learning SDK, CLI, or the Azure portal.

Creating an AKS cluster is a one time process for your workspace. You can reuse this cluster for multiple deployments. If you delete the cluster, then you must create a new cluster the next time you need to deploy.

Azure Kubernetes Service provides the following capabilities:

  • Autoscaling
  • Logging
  • Model data collection
  • Fast response times for your web services

Create a new cluster

To create a new Azure Kubernetes Service cluster, use the following code:

Important

Creating the AKS cluster is a one time process for your workspace. Once created, you can reuse this cluster for multiple deployments. If you delete the cluster or the resource group that contains it, then you must create a new cluster the next time you need to deploy. For provisioning_configuration(), if you pick custom values for agent_count and vm_size, then you need to make sure agent_count multiplied by vm_size is greater than or equal to 12 virtual CPUs. For example, if you use a vm_size of "Standard_D3_v2", which has 4 virtual CPUs, then you should pick an agent_count of 3 or greater.

from azureml.core.compute import AksCompute, ComputeTarget

# Use the default configuration (you can also provide parameters to customize this)
prov_config = AksCompute.provisioning_configuration()

aks_name = 'aml-aks-1' 
# Create the cluster
aks_target = ComputeTarget.create(workspace = ws, 
                                    name = aks_name, 
                                    provisioning_configuration = prov_config)

# Wait for the create process to complete
aks_target.wait_for_completion(show_output = True)
print(aks_target.provisioning_state)
print(aks_target.provisioning_errors)

Time estimate: Approximately 20 minutes.

Use an existing cluster

If you already have AKS cluster in your Azure subscription, and it is version 1.11.*, you can use it to deploy your image. The following code demonstrates how to attach an existing cluster to your workspace:

from azureml.core.compute import AksCompute, ComputeTarget
# Set the resource group that contains the AKS cluster and the cluster name
resource_group = 'myresourcegroup'
cluster_name = 'mycluster'

# Attach the cluster to your workgroup
attach_config = AksCompute.attach_configuration(resource_group = resource_group,
                                         cluster_name = cluster_name)
aks_target = ComputeTarget.attach(ws, 'mycompute', attach_config)

# Wait for the operation to complete
aks_target.wait_for_completion(True)

Time estimate: Approximately 3 minutes.

Deploy the image

To deploy the image created in the Create the image section of this document to the Azure Kubernetes Server cluster, use the following code:

from azureml.core.webservice import Webservice, AksWebservice

# Set configuration and service name
aks_config = AksWebservice.deploy_configuration()
aks_service_name ='aks-service-1'
# Deploy from image
service = Webservice.deploy_from_image(workspace = ws, 
                                            name = aks_service_name,
                                            image = image,
                                            deployment_config = aks_config,
                                            deployment_target = aks_target)
# Wait for the deployment to complete
service.wait_for_deployment(show_output = True)
print(service.state)

Time estimate: Approximately 3 minutes.

For more information, see the reference documentation for the AksWebservice and Webservice classes.

Inference with Azure ML Compute

Azure ML compute targets are created and managed by the Azure Machine Learning service. They can be used for batch prediction from Azure ML Pipelines.

For a walkthrough of batch inference with Azure ML Compute, read the How to Run Batch Predictions document.

Deploy to field-programmable gate arrays (FPGA)

Project Brainwave makes it possible to achieve ultra-low latency for real-time inferencing requests. Project Brainwave accelerates deep neural networks (DNN) deployed on field-programmable gate arrays in the Azure cloud. Commonly used DNNs are available as featurizers for transfer learning, or customizable with weights trained from your own data.

For a walkthrough of deploying a model using Project Brainwave, see the Deploy to a FPGA document.

Deploy to Azure IoT Edge

An Azure IoT Edge device is a Linux or Windows-based device that runs the Azure IoT Edge runtime. Machine learning models can be deployed to these devices as IoT Edge modules. Deploying a model to an IoT Edge device allows the device to use the model directly, instead of having to send data to the cloud for processing. You get faster response times and less data transfer.

Azure IoT Edge modules are deployed to your device from a container registry. When you create an image from your model, it is stored in the container registry for your workspace.

Set up your environment

Prepare the IoT device

You must create an IoT hub and register a device or reuse one you have with this script.

ssh <yourusername>@<yourdeviceip>
sudo wget https://raw.githubusercontent.com/Azure/ai-toolkit-iot-edge/master/amliotedge/createNregister
sudo chmod +x createNregister
sudo ./createNregister <The Azure subscriptionID you want to use> <Resourcegroup to use or create for the IoT hub> <Azure location to use e.g. eastus2> <the Hub ID you want to use or create> <the device ID you want to create>

Save the resulting connection string after "cs":"{copy this string}".

Initialize your device by downloading this script into an UbuntuX64 IoT Edge node or DSVM to run the following commands:

ssh <yourusername>@<yourdeviceip>
sudo wget https://raw.githubusercontent.com/Azure/ai-toolkit-iot-edge/master/amliotedge/installIoTEdge
sudo chmod +x installIoTEdge
sudo ./installIoTEdge

The IoT Edge node is ready to receive the connection string for your IoT Hub. Look for the line device_connection_string: and paste the connection string from above in between the quotes.

You can also learn how to register your device and install the IoT runtime by following the Quickstart: Deploy your first IoT Edge module to a Linux x64 device document.

Get the container registry credentials

To deploy an IoT Edge module to your device, Azure IoT needs the credentials for the container registry that Azure Machine Learning service stores docker images in.

You can easily retrieve the necessary container registry credentials in two ways:

  • In the Azure portal:

    1. Sign in to the Azure portal.

    2. Go to your Azure Machine Learning service workspace and select Overview. To go to the container registry settings, select the Registry link.

      An image of the container registry entry

    3. Once in the container registry, select Access Keys and then enable the admin user.

      An image of the access keys screen

    4. Save the values for login server, username, and password.

  • With a Python script:

    1. Use the following Python script after the code you ran above to create a container:

      # Getting your container details
      container_reg = ws.get_details()["containerRegistry"]
      reg_name=container_reg.split("/")[-1]
      container_url = "\"" + image.image_location + "\","
      subscription_id = ws.subscription_id
      from azure.mgmt.containerregistry import ContainerRegistryManagementClient
      from azure.mgmt import containerregistry
      client = ContainerRegistryManagementClient(ws._auth,subscription_id)
      result= client.registries.list_credentials(resource_group_name, reg_name, custom_headers=None, raw=False)
      username = result.username
      password = result.passwords[0].value
      print('ContainerURL{}'.format(image.image_location))
      print('Servername: {}'.format(reg_name))
      print('Username: {}'.format(username))
      print('Password: {}'.format(password))
      
    2. Save the values for ContainerURL, servername, username, and password.

      These credentials are necessary to provide the IoT Edge device access to images in your private container registry.

Deploy the model to the device

You can easily deploy a model by running this script and providing the following information from the steps above: container registry Name, username, password, image location url, desired deployment name, IoT Hub name, and the device ID you created. You can do this in the VM by following these steps:

wget https://raw.githubusercontent.com/Azure/ai-toolkit-iot-edge/master/amliotedge/deploymodel
sudo chmod +x deploymodel
sudo ./deploymodel <ContainerRegistryName> <username> <password> <imageLocationURL> <DeploymentID> <IoTHubname> <DeviceID>

Alternatively, you can follow the steps in the Deploy Azure IoT Edge modules from the Azure portal document to deploy the image to your device. When configuring the Registry settings for the device, use the login server, username, and password for your workspace container registry.

Note

If you're unfamiliar with Azure IoT, see the following documents for information on getting started with the service:

Testing web service deployments

To test a web service deployment, you can use the run method of the Webservice object. In the following example, a JSON document is set to a web service and the result is displayed. The data sent must match what the model expects. In this example, the data format matches the input expected by the diabetes model.

import json

test_sample = json.dumps({'data': [
    [1,2,3,4,5,6,7,8,9,10], 
    [10,9,8,7,6,5,4,3,2,1]
]})
test_sample = bytes(test_sample,encoding = 'utf8')

prediction = service.run(input_data = test_sample)
print(prediction)

The webservice is a REST API, so you can create client applications in a variety of programming languages. For more information, see Create client applications to consume webservices.

Update the web service

When you create a new image, you must manually update each service that you want to use the new image. To update the web service, use the update method. The following code demonstrates how to update the web service to use a new image:

from azureml.core.webservice import Webservice
from azureml.core.image import Image

service_name = 'aci-mnist-3'
# Retrieve existing service
service = Webservice(name = service_name, workspace = ws)

# point to a different image
new_image = Image(workspace = ws, id="myimage2:1")

# Update the image used by the service
service.update(image = new_image)
print(service.state)

For more information, see the reference documentation for the Webservice class.

Clean up

To delete a deployed web service, use service.delete().

To delete an image, use image.delete().

To delete a registered model, use model.delete().

For more information, see the reference documentation for WebService.delete(), Image.delete(), and Model.delete().

Troubleshooting

  • If there are errors during deployment, use service.get_logs() to view the service logs. The logged information may indicate the cause of the error.

  • The logs may contain an error that instructs you to set logging level to DEBUG. To set the logging level, add the following lines to your scoring script, create the image, and then create a service using the image:

    import logging
    logging.basicConfig(level=logging.DEBUG)
    

    This change enables additional logging, and may return more information on why the error is occurring.

Next steps