Prepare data, train, deploy, and monitor machine learning models with Azure Pipelines

Azure Pipelines

You can use a pipeline to automate the machine learning lifecycle. Some of the operations you can automate are:

  • Data preparation (extract, transform, load operations)
  • Training machine learning models with on-demand scale-out and scale-up
  • Deployment of machine learning models as public or private web services
  • Monitoring deployed machine learning models (such as for performance or data-drift analysis)

This article will teach you how to create an Azure Pipeline that builds and deploys a machine learning model as a web service.

Prerequisites

Before you read this topic, you should understand how the Azure Machine Learning service works.

Follow the steps in Azure Machine Learning quickstart: portal to create a workspace.

Get the code

Fork this repo in GitHub:

https://github.com/MicrosoftDocs/pipelines-azureml

This sample includes a file diabetes-train-and-deploy.yml in the directory pipelines/.

Sign in to Azure Pipelines

Sign in to Azure Pipelines. After you sign in, your browser goes to https://dev.azure.com/my-organization-name and displays your Azure DevOps dashboard.

Within your selected organization, create a project. If you don't have any projects in your organization, you see a Create a project to get started screen. Otherwise, select the Create Project button in the upper-right corner of the dashboard.

Create the pipeline

You can use 1 of the following approach to create a new pipeline.

  1. Sign in to your Azure DevOps organization and navigate to your project.

  2. Go to Pipelines, and then select New Pipeline.

  3. Walk through the steps of the wizard by first selecting GitHub as the location of your source code.

  4. You might be redirected to GitHub to sign in. If so, enter your GitHub credentials.

  5. When the list of repositories appears, select your repository.

  6. You might be redirected to GitHub to install the Azure Pipelines app. If so, select Approve & install.

When your new pipeline appears:

  1. Replace myresourcegroup with the name of the Azure resource group that contains your Azure Machine Learning service workspace.

  2. Replace myworkspace with the name of your Azure Machine Learning service workspace.

  3. When you're ready, select Save and run.

  4. You're prompted to commit your changes to the diabetes-train-and-deploy.yml file in your repository. After you're happy with the message, select Save and run again.

    If you want to watch your pipeline in action, select the build job.

You now have a YAML pipeline in your repository that's ready to train your model!

Azure Machine Learning service automation

There are two primary ways to use automation with the Azure Machine Learning service:

  • The Machine Learning CLI is an extension to the Azure CLI. It provides commands for working with the Azure Machine Learning service.
  • The Azure Machine Learning SDK is Python package that provides programmatic access to the Azure Machine Learning service.
    • The Python SDK includes automated machine learning to help automating the time consuming, iterative tasks of machine learning model development.

The example with this document uses the Machine Learning CLI.

Planning

Before you use Azure Pipelines to automate model training and deployment, you must understand the files needed by the model and what indicates a "good" trained model.

Machine learning files

In most cases, your data science team will provide the files and resources needed to train the machine learning model. In the example project, data scientists would provide these files:

  • Training script (train.py): The training script contains logic specific to the model that you're training.
  • Scoring file (score.py): When the model is deployed as a web service, the scoring file receives data from clients and scores it against the model. The output is then returned to the client.
  • RunConfig settings (sklearn.runconfig): Defines how the training script is run on the compute target that is used for training.
  • Training environment (myenv.yml): Defines the packages needed to run the training script.
  • Deployment environment (deploymentConfig.yml): Defines the resources and compute needed for the deployment environment.
  • Deployment environment (inferenceConfig.yml): Defines the packages needed to run and score the model in the deployment environment.

Some of these files are directly used when developing a model. For example, the train.py and score.py files. However the data scientist may be programmatically creating the run configuration and environment settings. If so, they can create the .runconfig and training environment files, by using RunConfiguration.save(). Or, default run configuration files can be created for all compute targets already in the workspace by running the following command:

az ml folder attach --experiment-name myexp -w myws -g mygroup

The files created by this command are stored in the .azureml directory.

Determine the best model

The example pipeline deploys the trained model without doing any performance checks. In a production scenario, you may want to log metrics so that you can determine the "best" model.

For example, you have a model that is already deployed and has an accuracy of 90. You train a new model based on new checkins to the repo, and the accuracy is only 80, so you don't want to deploy it. You can use a metric such as this to build automation logic, as you can directly rank different models. In other cases, you may have several metrics that are used to indicate the "best" model. In this case, choosing the best model requires human judgment.

Depending on what "best" looks like for your scenario, you may need to create a release pipeline where someone must inspect the metrics to determine if the model should be deployed.

To log metrics during training, use the Run class.

Azure CLI Deploy task

The Azure CLI Deploy task is used to run Azure CLI commands. In the example, it installs the Azure Machine Learning CLI extension and then uses individual CLI commands to train and deploy the model.

Azure Service Connection

The Azure CLI Deploy task requires an Azure service connection. The Azure service connection stores the credentials needed to connect from Azure Pipelines to Azure.

The name of the connection used by the example is azmldemows

To create a service connection, see Create an Azure service connection.

Machine Learning CLI

The following Azure Machine Learning service CLI commands are used in the example for this document:

Command Purpose
az ml folder attach Associates the files in the project with your Azure Machine Learning service workspace.
az ml computetarget create Creates a compute target that is used to train the model.
az ml experiment list Lists experiments for your workspace.
az ml run submit-script Submits the model for training.
az ml model register Registers a trained model with your workspace.
az ml model deploy Deploys the model as a web service.
az ml service list Lists deployed services.
az ml service delete Deletes a deployed service.
az ml pipeline list Lists Azure Machine Learning pipelines.
az ml computetarget delete Deletes a compute target.

For more information on these commands, see the CLI extension reference.

Next steps

Learn how you can further integrate machine learning into your pipelines with the Machine Learning extension.

For more examples of using Azure Pipelines with Azure Machine Learning service, see the following repos: