Troubleshoot Docker deployment of models with Azure Kubernetes Service and Azure Container Instances

Learn how to troubleshoot and solve, or work around, common Docker deployment errors with Azure Container Instances (ACI) and Azure Kubernetes Service (AKS) using Azure Machine Learning.

Prerequisites

Steps for Docker deployment of machine learning models

When deploying a model in Azure Machine Learning, the system performs a number of tasks.

The recommended approach for model deployment is via the Model.deploy() API using an Environment object as an input parameter. In this case, the service creates a base docker image during deployment stage and mounts the required models all in one call. The basic deployment tasks are:

  1. Register the model in the workspace model registry.

  2. Define Inference Configuration:

    1. Create an Environment object based on the dependencies you specify in the environment yaml file or use one of our procured environments.
    2. Create an inference configuration (InferenceConfig object) based on the environment and the scoring script.
  3. Deploy the model to Azure Container Instance (ACI) service or to Azure Kubernetes Service (AKS).

Learn more about this process in the Model Management introduction.

Before you begin

If you run into any issue, the first thing to do is to break down the deployment task (previous described) into individual steps to isolate the problem.

Assuming you are using the new/recommended deployment method via Model.deploy() API with an Environment object as an input parameter, your code can be broken down into three major steps:

  1. Register the model. Here is some sample code:

    from azureml.core.model import Model
    
    
    # register a model out of a run record
    model = best_run.register_model(model_name='my_best_model', model_path='outputs/my_model.pkl')
    
    # or, you can register a file or a folder of files as a model
    model = Model.register(model_path='my_model.pkl', model_name='my_best_model', workspace=ws)
    
  2. Define inference configuration for deployment:

    from azureml.core.model import InferenceConfig
    from azureml.core.environment import Environment
    
    
    # create inference configuration based on the requirements defined in the YAML
    myenv = Environment.from_conda_specification(name="myenv", file_path="myenv.yml")
    inference_config = InferenceConfig(entry_script="score.py", environment=myenv)
    
  3. Deploy the model using the inference configuration created in the previous step:

    from azureml.core.webservice import AciWebservice
    
    
    # deploy the model
    aci_config = AciWebservice.deploy_configuration(cpu_cores=1, memory_gb=1)
    aci_service = Model.deploy(workspace=ws,
                           name='my-service',
                           models=[model],
                           inference_config=inference_config,
                           deployment_config=aci_config)
    aci_service.wait_for_deployment(show_output=True)
    

Once you have broken down the deployment process into individual tasks, we can look at some of the most common errors.

Debug locally

If you encounter problems deploying a model to ACI or AKS, try deploying it as a local web service. Using a local web service makes it easier to troubleshoot problems. The Docker image containing the model is downloaded and started on your local system.

You can find a sample local deployment notebook in the MachineLearningNotebooks repo to explore a runnable example.

Warning

Local web service deployments are not supported for production scenarios.

To deploy locally, modify your code to use LocalWebservice.deploy_configuration() to create a deployment configuration. Then use Model.deploy() to deploy the service. The following example deploys a model (contained in the model variable) as a local web service:

from azureml.core.environment import Environment
from azureml.core.model import InferenceConfig, Model
from azureml.core.webservice import LocalWebservice


# Create inference configuration based on the environment definition and the entry script
myenv = Environment.from_conda_specification(name="env", file_path="myenv.yml")
inference_config = InferenceConfig(entry_script="score.py", environment=myenv)
# Create a local deployment, using port 8890 for the web service endpoint
deployment_config = LocalWebservice.deploy_configuration(port=8890)
# Deploy the service
service = Model.deploy(
    ws, "mymodel", [model], inference_config, deployment_config)
# Wait for the deployment to complete
service.wait_for_deployment(True)
# Display the port that the web service is available on
print(service.port)

If you are defining your own conda specification YAML, you must list azureml-defaults with version >= 1.0.45 as a pip dependency. This package contains the functionality needed to host the model as a web service.

At this point, you can work with the service as normal. For example, the following code demonstrates sending data to the service:

import json

test_sample = json.dumps({'data': [
    [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
    [10, 9, 8, 7, 6, 5, 4, 3, 2, 1]
]})

test_sample = bytes(test_sample, encoding='utf8')

prediction = service.run(input_data=test_sample)
print(prediction)

For more information on customizing your Python environment, see Create and manage environments for training and deployment.

Update the service

During local testing, you may need to update the score.py file to add logging or attempt to resolve any problems that you've discovered. To reload changes to the score.py file, use reload(). For example, the following code reloads the script for the service, and then sends data to it. The data is scored using the updated score.py file:

Important

The reload method is only available for local deployments. For information on updating a deployment to another compute target, see how to update your webservice.

service.reload()
print(service.run(input_data=test_sample))

Note

The script is reloaded from the location specified by the InferenceConfig object used by the service.

To change the model, Conda dependencies, or deployment configuration, use update(). The following example updates the model used by the service:

service.update([different_model], inference_config, deployment_config)

Delete the service

To delete the service, use delete().

Inspect the Docker log

You can print out detailed Docker engine log messages from the service object. You can view the log for ACI, AKS, and Local deployments. The following example demonstrates how to print the logs.

# if you already have the service object handy
print(service.get_logs())

# if you only know the name of the service (note there might be multiple services with the same name but different version number)
print(ws.webservices['mysvc'].get_logs())

If you see the line Booting worker with pid: <pid> occurring multiple times in the logs, it means, there isn't enough memory to start the worker. You can address the error by increasing the value of memory_gb in deployment_config

Container cannot be scheduled

When deploying a service to an Azure Kubernetes Service compute target, Azure Machine Learning will attempt to schedule the service with the requested amount of resources. If after 5 minutes, there are no nodes available in the cluster with the appropriate amount of resources available, the deployment will fail with the message Couldn't Schedule because the kubernetes cluster didn't have available resources after trying for 00:05:00. You can address this error by either adding more nodes, changing the SKU of your nodes or changing the resource requirements of your service.

The error message will typically indicate which resource you need more of - for instance, if you see an error message indicating 0/3 nodes are available: 3 Insufficient nvidia.com/gpu that means that the service requires GPUs and there are three nodes in the cluster that do not have available GPUs. This could be addressed by adding more nodes if you are using a GPU SKU, switching to a GPU enabled SKU if you are not or changing your environment to not require GPUs.

Service launch fails

After the image is successfully built, the system attempts to start a container using your deployment configuration. As part of container starting-up process, the init() function in your scoring script is invoked by the system. If there are uncaught exceptions in the init() function, you might see CrashLoopBackOff error in the error message.

Use the info in the Inspect the Docker log section to check the logs.

Function fails: get_model_path()

Often, in the init() function in the scoring script, Model.get_model_path() function is called to locate a model file or a folder of model files in the container. If the model file or folder cannot be found, the function fails. The easiest way to debug this error is to run the below Python code in the Container shell:

from azureml.core.model import Model
import logging
logging.basicConfig(level=logging.DEBUG)
print(Model.get_model_path(model_name='my-best-model'))

This example prints out the local path (relative to /var/azureml-app) in the container where your scoring script is expecting to find the model file or folder. Then you can verify if the file or folder is indeed where it is expected to be.

Setting the logging level to DEBUG may cause additional information to be logged, which may be useful in identifying the failure.

Function fails: run(input_data)

If the service is successfully deployed, but it crashes when you post data to the scoring endpoint, you can add error catching statement in your run(input_data) function so that it returns detailed error message instead. For example:

def run(input_data):
    try:
        data = json.loads(input_data)['data']
        data = np.array(data)
        result = model.predict(data)
        return json.dumps({"result": result.tolist()})
    except Exception as e:
        result = str(e)
        # return error message back to the client
        return json.dumps({"error": result})

Note: Returning error messages from the run(input_data) call should be done for debugging purpose only. For security reasons, you should not return error messages this way in a production environment.

HTTP status code 502

A 502 status code indicates that the service has thrown an exception or crashed in the run() method of the score.py file. Use the information in this article to debug the file.

HTTP status code 503

Azure Kubernetes Service deployments support autoscaling, which allows replicas to be added to support additional load. However, the autoscaler is designed to handle gradual changes in load. If you receive large spikes in requests per second, clients may receive an HTTP status code 503.

There are two things that can help prevent 503 status codes:

  • Change the utilization level at which autoscaling creates new replicas.

    By default, autoscaling target utilization is set to 70%, which means that the service can handle spikes in requests per second (RPS) of up to 30%. You can adjust the utilization target by setting the autoscale_target_utilization to a lower value.

    Important

    This change does not cause replicas to be created faster. Instead, they are created at a lower utilization threshold. Instead of waiting until the service is 70% utilized, changing the value to 30% causes replicas to be created when 30% utilization occurs.

    If the web service is already using the current max replicas and you are still seeing 503 status codes, increase the autoscale_max_replicas value to increase the maximum number of replicas.

  • Change the minimum number of replicas. Increasing the minimum replicas provides a larger pool to handle the incoming spikes.

    To increase the minimum number of replicas, set autoscale_min_replicas to a higher value. You can calculate the required replicas by using the following code, replacing values with values specific to your project:

    from math import ceil
    # target requests per second
    targetRps = 20
    # time to process the request (in seconds)
    reqTime = 10
    # Maximum requests per container
    maxReqPerContainer = 1
    # target_utilization. 70% in this example
    targetUtilization = .7
    
    concurrentRequests = targetRps * reqTime / targetUtilization
    
    # Number of container replicas
    replicas = ceil(concurrentRequests / maxReqPerContainer)
    

    Note

    If you receive request spikes larger than the new minimum replicas can handle, you may receive 503s again. For example, as traffic to your service increases, you may need to increase the minimum replicas.

For more information on setting autoscale_target_utilization, autoscale_max_replicas, and autoscale_min_replicas for, see the AksWebservice module reference.

HTTP status code 504

A 504 status code indicates that the request has timed out. The default timeout is 1 minute.

You can increase the timeout or try to speed up the service by modifying the score.py to remove unnecessary calls. If these actions do not correct the problem, use the information in this article to debug the score.py file. The code may be in a non-responsive state or an infinite loop.

Advanced debugging

In some cases, you may need to interactively debug the Python code contained in your model deployment. For example, if the entry script is failing and the reason cannot be determined by additional logging. By using Visual Studio Code and the debugpy, you can attach to the code running inside the Docker container. For more information, visit the interactive debugging in VS Code guide.

Next steps

Learn more about deployment: