Deploy ML model to Kubernetes + overwrite previous endpoint

Axel Vandevelde 1 Reputation point
2020-08-26T07:12:45.377+00:00

I'm building a CI/CD pipeline in Azure DevOps for the deployment of my Machine Learning model to Azure Kubernetes Service. I have the following task in my YAML pipeline file (replaced some of the values with '...'):

- task: AzureCLI@1
  displayName: "Deploy to AKS"
  inputs:
    azureSubscription: '...'
    scriptLocation: inlineScript
    workingDirectory: $(Build.SourcesDirectory)/score
    inlineScript: |
      set -e # fail on error

      az ml model deploy --name 'aks-deploy-test' --model '$(MODEL_NAME):$(get_model.MODEL_VERSION)' \
      --compute-target $(AKS_COMPUTE_NAME) \
      --ic inference_config.yml \
      --dc deployment_config_aks.yml \
      -g ... --workspace-name ... \
      --overwrite -v

When I run the pipeline the first time, it successfully deployed the ML model and I can see the Endpoint in the Azure ML workspace. However, when I try to run the pipeline a second time (to deploy a newer version of the model), I get the error:

Error:
{
  "code": "KubernetesError",
  "statusCode": 400,
  "message": "Kubernetes Deployment Error",
  "details": [
    {
      "code": "Unschedulable",
      "message": "0/6 nodes are available: 4 Insufficient cpu, 6 Insufficient memory."
    },
    {
      "code": "DeploymentFailed",
      "message": "Couldn't schedule because the kubernetes cluster didn't have available resources after trying for 00:05:00.\nYou can address this error by either adding more nodes, changing the SKU of your nodes or changing the resource requirements of your service.\nPlease refer to https://aka.ms/debugimage#container-cannot-be-scheduled for more information."
    }
  ]
}

Isn't the --overwrite option in the az ml model deploy command supposed to completely overwrite the current deployment of the model? If so, why am I still getting this error, or is there a better way to deploy a newer version of the ML model to the same AKS cluster?

Azure Machine Learning
Azure Machine Learning
An Azure machine learning service for building and deploying models.
2,580 questions
Azure Kubernetes Service (AKS)
Azure Kubernetes Service (AKS)
An Azure service that provides serverless Kubernetes, an integrated continuous integration and continuous delivery experience, and enterprise-grade security and governance.
1,877 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Ramr-msft 17,616 Reputation points
    2020-08-28T04:02:16.747+00:00

    @Axel Vandevelde Thanks for the question. The error message literally indicates the issue, none of the nodes have CPUs available.
    Can you please add more details about the sku's. If Possible please share the link to the tutorial/documentation you were following.

    Enable AppInsights flag will start flowing the logs (for Application monitoring).
    https://learn.microsoft.com/en-us/azure/machine-learning/service/how-to-enable-app-insights

    For cluster monitoring you can use log analytics via AKS.
    https://learn.microsoft.com/en-us/azure/aks/view-master-logs