Troubleshooting Azure Machine Learning service AKS and ACI deployments

Learn how to work around or solve common Docker deployment errors with Azure Container Instances (ACI) and Azure Kubernetes Service (AKS) using Azure Machine Learning service.

When deploying a model in Azure Machine Learning service, the system performs a number of tasks. The deployment tasks are:

  1. Register the model in the workspace model registry.

  2. Build a Docker image, including:

    1. Download the registered model from the registry.
    2. Create a dockerfile, with a Python environment based on the dependencies you specify in the environment yaml file.
    3. Add your model files and the scoring script you supply in the dockerfile.
    4. Build a new Docker image using the dockerfile.
    5. Register the Docker image with the Azure Container Registry associated with the workspace.


    Depending on your code, image creation happen automatically without your input.

  3. Deploy the Docker image to Azure Container Instance (ACI) service or to Azure Kubernetes Service (AKS).

  4. Start up a new container (or containers) in ACI or AKS.

Learn more about this process in the Model Management introduction.

Before you begin

If you run into any issue, the first thing to do is to break down the deployment task (previous described) into individual steps to isolate the problem.

Breaking the deployment into tasks is helpful if you are using the Webservice.deploy() API, or Webservice.deploy_from_model() API, as both of these functions perform the aforementioned steps as a single action. Typically those APIs are convenient, but it helps to break up the steps when troubleshooting by replacing them with the below API calls.

  1. Register the model. Here's some sample code:

    # register a model out of a run record
    model = best_run.register_model(model_name='my_best_model', model_path='outputs/my_model.pkl')
    # or, you can register a file or a folder of files as a model
    model = Model.register(model_path='my_model.pkl', model_name='my_best_model', workspace=ws)
  2. Build the image. Here's some sample code:

    # configure the image
    image_config = ContainerImage.image_configuration(runtime="python",
    # create the image
    image = Image.create(name='myimg', models=[model], image_config=image_config, workspace=ws)
    # wait for image creation to finish
  3. Deploy the image as service. Here's some sample code:

    # configure an ACI-based deployment
    aci_config = AciWebservice.deploy_configuration(cpu_cores=1, memory_gb=1)
    aci_service = Webservice.deploy_from_image(deployment_config=aci_config, 

Once you have broken down the deployment process into individual tasks, we can look at some of the most common errors.

Image building fails

If the Docker image cannot be built, the image.wait_for_creation() or service.wait_for_deployment() call fails with some error messages that can offer some clues. You can also find out more details about the errors from the image build log. Below is some sample code showing how to discover the image build log uri.

# if you already have the image object handy

# if you only know the name of the image (note there might be multiple images with the same name but different version number)

# list logs for all images in the workspace
for name, img in ws.images.items():
    print (, img.version, img.image_build_log_uri)

The image log uri is a SAS URL pointing to a log file stored in your Azure blob storage. Simply copy and paste the uri into a browser window and you can download and view the log file.

Azure Key Vault access policy and Azure Resource Manager templates

The image build can also fail due to a problem with the access policy on Azure Key Vault. This situation can occur when you use an Azure Resource Manager template to create the workspace and associated resources (including Azure Key Vault), multiple times. For example, using the template multiple times with the same parameters as part of a continuous integration and deployment pipeline.

Most resource creation operations through templates are idempotent, but Key Vault clears the access policies each time the template is used. Clearing the access policies breaks access to the Key Vault for any existing workspace that is using it. This condition results in errors when you try to create new images. The following are examples of the errors that you can receive:


Create image "myimage": An internal server error occurred. Please try again. If the problem persists, contact support.


image = ContainerImage.create(name = "myimage", models = [model], image_config = image_config, workspace = ws)
Creating image
Traceback (most recent call last):
  File "C:\Python37\lib\site-packages\azureml\core\image\", line 341, in create
  File "C:\Python37\lib\site-packages\requests\", line 940, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 500 Server Error: Internal Server Error for url:<subscription-id>/resourceGroups/<resource-group>/providers/Microsoft.MachineLearningServices/workspaces/<workspace-name>/images?api-version=2018-11-19

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python37\lib\site-packages\azureml\core\image\", line 346, in create
    'Content: {}'.format(resp.status_code, resp.headers, resp.content))
azureml.exceptions._azureml_exception.WebserviceException: Received bad response from Model Management Service:
Response Code: 500
Headers: {'Date': 'Tue, 26 Feb 2019 17:47:53 GMT', 'Content-Type': 'application/json', 'Transfer-Encoding': 'chunked', 'Connection': 'keep-alive', 'api-supported-versions': '2018-03-01-preview, 2018-11-19', 'x-ms-client-request-id': '3cdcf791f1214b9cbac93076ebfb5167', 'x-ms-client-session-id': '', 'Strict-Transport-Security': 'max-age=15724800; includeSubDomains; preload'}
Content: b'{"code":"InternalServerError","statusCode":500,"message":"An internal server error occurred. Please try again. If the problem persists, contact support"}'


ERROR: {'Azure-cli-ml Version': None, 'Error': WebserviceException('Received bad response from Model Management Service:\nResponse Code: 500\nHeaders: {\'Date\': \'Tue, 26 Feb 2019 17:34:05
GMT\', \'Content-Type\': \'application/json\', \'Transfer-Encoding\': \'chunked\', \'Connection\': \'keep-alive\', \'api-supported-versions\': \'2018-03-01-preview, 2018-11-19\', \'x-ms-client-request-id\':
\'bc89430916164412abe3d82acb1d1109\', \'x-ms-client-session-id\': \'\', \'Strict-Transport-Security\': \'max-age=15724800; includeSubDomains; preload\'}\nContent:
b\'{"code":"InternalServerError","statusCode":500,"message":"An internal server error occurred. Please try again. If the problem persists, contact support"}\'',)}

To avoid this problem, we recommend one of the following approaches:

  • Do not deploy the template more than once for the same parameters. Or delete the existing resources before using the template to recreate them.
  • Examine the Key Vault access policies and then use these policies to set the accessPolicies property of the template.
  • Check if the Key Vault resource already exists. If it does, do not recreate it through the template. For example, add a parameter that allows you to disable the creation of the Key Vault resource if it already exists.

Debug locally

If you encounter problems deploying a model to ACI or AKS, try deploying it as a local web service. Using a local web service makes it easier to troubleshoot problems. The Docker image containing the model is downloaded and started on your local system.


Local web service deployments require a working Docker installation on your local system. Docker must be running before you deploy a local web service. For information on installing and using Docker, see


Local web service deployments are not supported for production scenarios.

To deploy locally, modify your code to use LocalWebservice.deploy_configuration() to create a deployment configuration. Then use Model.deploy() to deploy the service. The following example deploys a model (contained in the model variable) as a local web service:

from azureml.core.model import InferenceConfig
from azureml.core.webservice import LocalWebservice

# Create inference configuration. This creates a docker image that contains the model.
inference_config = InferenceConfig(runtime= "python", 

# Create a local deployment, using port 8890 for the web service endpoint
deployment_config = LocalWebservice.deploy_configuration(port=8890)
# Deploy the service
service = Model.deploy(ws, "mymodel", [model], inference_config, deployment_config)
# Wait for the deployment to complete
# Display the port that the web service is available on

At this point, you can work with the service as normal. For example, the following code demonstrates sending data to the service:

import json

test_sample = json.dumps({'data': [

test_sample = bytes(test_sample,encoding = 'utf8')

prediction =

Update the service

During local testing, you may need to update the file to add logging or attempt to resolve any problems that you've discovered. To reload changes to the file, use reload(). For example, the following code reloads the script for the service, and then sends data to it. The data is scored using the updated file:



The script is reloaded from the location specified by the InferenceConfig object used by the service.

To change the model, Conda dependencies, or deployment configuration, use update(). The following example updates the model used by the service:

service.update([different_model], inference_config, deployment_config)

Delete the service

To delete the service, use delete().

Inspect the Docker log

You can print out detailed Docker engine log messages from the service object. You can view the log for ACI, AKS, and Local deployments. The following example demonstrates how to print the logs.

# if you already have the service object handy

# if you only know the name of the service (note there might be multiple services with the same name but different version number)

Service launch fails

After the image is successfully built, the system attempts to start a container using your deployment configuration. As part of container starting-up process, the init() function in your scoring script is invoked by the system. If there are uncaught exceptions in the init() function, you might see CrashLoopBackOff error in the error message.

Use the info in the Inspect the Docker log section to check the logs.

Function fails: get_model_path()

Often, in the init() function in the scoring script, Model.get_model_path() function is called to locate a model file or a folder of model files in the container. If the model file or folder cannot be found, the function fails. The easiest way to debug this error is to run the below Python code in the Container shell:

import logging
from azureml.core.model import Model

This example prints out the local path (relative to /var/azureml-app) in the container where your scoring script is expecting to find the model file or folder. Then you can verify if the file or folder is indeed where it is expected to be.

Setting the logging level to DEBUG may cause additional information to be logged, which may be useful in identifying the failure.

Function fails: run(input_data)

If the service is successfully deployed, but it crashes when you post data to the scoring endpoint, you can add error catching statement in your run(input_data) function so that it returns detailed error message instead. For example:

def run(input_data):
        data = json.loads(input_data)['data']
        data = np.array(data)
        result = model.predict(data)
        return json.dumps({"result": result.tolist()})
    except Exception as e:
        result = str(e)
        # return error message back to the client
        return json.dumps({"error": result})

Note: Returning error messages from the run(input_data) call should be done for debugging purpose only. For security reasons, you should not return error messages this way in a production environment.

HTTP status code 503

Azure Kubernetes Service deployments support autoscaling, which allows replicas to be added to support additional load. However, the autoscaler is designed to handle gradual changes in load. If you receive large spikes in requests per second, clients may receive an HTTP status code 503.

There are two things that can help prevent 503 status codes:

  • Change the utilization level at which autoscaling creates new replicas.

    By default, autoscaling target utilization is set to 70%, which means that the service can handle spikes in requests per second (RPS) of up to 30%. You can adjust the utilization target by setting the autoscale_target_utilization to a lower value.


    This change does not cause replicas to be created faster. Instead, they are created at a lower utilization threshold. Instead of waiting until the service is 70% utilized, changing the value to 30% causes replicas to be created when 30% utilization occurs.

    If the web service is already using the current max replicas and you are still seeing 503 status codes, increase the autoscale_max_replicas value to increase the maximum number of replicas.

  • Change the minimum number of replicas. Increasing the minimum replicas provides a larger pool to handle the incoming spikes.

    To increase the minimum number of replicas, set autoscale_min_replicas to a higher value. You can calculate the required replicas by using the following code, replacing values with values specific to your project:

    from math import ceil
    # target requests per second
    targetRps = 20
    # time to process the request (in seconds)
    reqTime = 10
    # Maximum requests per container
    maxReqPerContainer = 1
    # target_utilization. 70% in this example
    targetUtilization = .7
    concurrentRequests = targetRps * reqTime / targetUtilization
    # Number of container replicas
    replicas = ceil(concurrentRequests / maxReqPerContainer)


    If you receive request spikes larger than the new minimum replicas can handle, you may receive 503s again. For example, as traffic to your service increases, you may need to increase the minimum replicas.

For more information on setting autoscale_target_utilization, autoscale_max_replicas, and autoscale_min_replicas for, see the AksWebservice module reference.

Next steps

Learn more about deployment: