Track experiment runs and deploy ML models with MLflow and Azure Machine Learning (preview)

In this article, learn how to enable MLflow's tracking URI and logging API, collectively known as MLflow Tracking, to connect Azure Machine Learning as the backend of your MLflow experiments.

Supported capabilities include:

  • Track and log experiment metrics and artifacts in your Azure Machine Learning workspace. If you already use MLflow Tracking for your experiments, the workspace provides a centralized, secure, and scalable location to store training metrics and models.

  • Submit training jobs with MLflow Projects with Azure Machine Learning backend support (preview). You can submit jobs locally with Azure Machine Learning tracking or migrate your runs to the cloud like via an Azure Machine Learning Compute.

  • Track and manage models in MLflow and Azure Machine Learning model registry.

  • Deploy your MLflow experiments as an Azure Machine Learning web service. By deploying as a web service, you can apply the Azure Machine Learning monitoring and data drift detection functionalities to your production models.

MLflow is an open-source library for managing the life cycle of your machine learning experiments. MLFlow Tracking is a component of MLflow that logs and tracks your training run metrics and model artifacts, no matter your experiment's environment--locally on your computer, on a remote compute target, a virtual machine, or an Azure Databricks cluster.

Note

As an open source library, MLflow changes frequently. As such, the functionality made available via the Azure Machine Learning and MLflow integration should be considered as a preview, and not fully supported by Microsoft.

The following diagram illustrates that with MLflow Tracking, you track an experiment's run metrics and store model artifacts in your Azure Machine Learning workspace.

mlflow with azure machine learning diagram

Tip

The information in this document is primarily for data scientists and developers who want to monitor the model training process. If you are an administrator interested in monitoring resource usage and events from Azure Machine Learning, such as quotas, completed training runs, or completed model deployments, see Monitoring Azure Machine Learning.

Compare MLflow and Azure Machine Learning clients

The following table summarizes the different clients that can use Azure Machine Learning, and their respective function capabilities.

MLflow Tracking offers metric logging and artifact storage functionalities that are only otherwise available via the Azure Machine Learning Python SDK.

Capability MLflow Tracking & Deployment Azure Machine Learning Python SDK Azure Machine Learning CLI Azure Machine Learning studio
Manage workspace
Use data stores
Log metrics
Upload artifacts
View metrics
Manage compute
Deploy models
Monitor model performance
Detect data drift

Prerequisites

Track local runs

MLflow Tracking with Azure Machine Learning lets you store the logged metrics and artifacts from your local runs into your Azure Machine Learning workspace.

Import the mlflow and Workspace classes to access MLflow's tracking URI and configure your workspace.

In the following code, the get_mlflow_tracking_uri() method assigns a unique tracking URI address to the workspace, ws, and set_tracking_uri() points the MLflow tracking URI to that address.

import mlflow
from azureml.core import Workspace

ws = Workspace.from_config()

mlflow.set_tracking_uri(ws.get_mlflow_tracking_uri())

Note

The tracking URI is valid up to an hour or less. If you restart your script after some idle time, use the get_mlflow_tracking_uri API to get a new URI.

Set the MLflow experiment name with set_experiment() and start your training run with start_run(). Then use log_metric() to activate the MLflow logging API and begin logging your training run metrics.

experiment_name = 'experiment_with_mlflow'
mlflow.set_experiment(experiment_name)

with mlflow.start_run():
    mlflow.log_metric('alpha', 0.03)

Track remote runs

Remote runs let you train your models on more powerful computes, such as GPU enabled virtual machines, or Machine Learning Compute clusters. See Use compute targets for model training to learn about different compute options.

MLflow Tracking with Azure Machine Learning lets you store the logged metrics and artifacts from your remote runs into your Azure Machine Learning workspace. Any run with MLflow Tracking code in it will have metrics logged automatically to the workspace.

The following example conda environment includes mlflow and azureml-mlflow as pip packages.

name: sklearn-example
dependencies:
  - python=3.6.2
  - scikit-learn
  - matplotlib
  - numpy
  - pip:
    - azureml-mlflow
    - numpy

In your script, configure your compute and training run environment with the Environment class. Then, construct ScriptRunConfig with your remote compute as the compute target.

import mlflow

with mlflow.start_run():
    mlflow.log_metric('example', 1.23)

With this compute and training run configuration, use the Experiment.submit() method to submit a run. This method automatically sets the MLflow tracking URI and directs the logging from MLflow to your Workspace.

run = exp.submit(src)

Train with MLflow Projects

MLflow Projects allow for you to organize and describe your code to let other data scientists (or automated tools) run it. MLflow Projects with Azure Machine Learning enables you to track and manage your training runs in your workspace.

This example shows how to submit MLflow projects locally with Azure Machine Learning tracking.

Install the azureml-mlflow package to use MLflow Tracking with Azure Machine Learning on your experiments locally. Your experiments can run via a Jupyter notebook or code editor.

pip install azureml-mlflow

Import the mlflow and Workspace classes to access MLflow's tracking URI and configure your workspace.

import mlflow
from azureml.core import Workspace

ws = Workspace.from_config()

mlflow.set_tracking_uri(ws.get_mlflow_tracking_uri())

Set the MLflow experiment name with set_experiment() and start your training run with start_run(). Then, use log_metric() to activate the MLflow logging API and begin logging your training run metrics.

experiment_name = 'experiment-with-mlflow-projects'
mlflow.set_experiment(experiment_name)

Create the backend configuration object to store necessary information for the integration such as, the compute target and which type of managed environment to use.

backend_config = {"USE_CONDA": False}

Add the azureml-mlflow package as a pip dependency to your environment configuration file in order to track metrics and key artifacts in your workspace.

name: mlflow-example
channels:
  - defaults
  - anaconda
  - conda-forge
dependencies:
  - python=3.6
  - scikit-learn=0.19.1
  - pip
  - pip:
    - mlflow
    - azureml-mlflow

Submit the local run and ensure you set the parameter backend = "azureml" . With this setting, you can submit runs locally and get the added support of automatic output tracking, log files, snapshots, and printed errors in your workspace.

View your runs and metrics in the Azure Machine Learning studio.

local_env_run = mlflow.projects.run(uri=".", 
                                    parameters={"alpha":0.3},
                                    backend = "azureml",
                                    use_conda=False,
                                    backend_config = backend_config, 
                                    )

View metrics and artifacts in your workspace

The metrics and artifacts from MLflow logging are kept in your workspace. To view them anytime, navigate to your workspace and find the experiment by name in your workspace in Azure Machine Learning studio. Or run the below code.

run.get_metrics()
ws.get_details()

Manage models

Register and track your models with the Azure Machine Learning model registry which supports the MLflow model registry. Azure Machine Learning models are aligned with the MLflow model schema making it easy to export and import these models across different workflows. The MLflow related metadata such as, run id is also tagged with the registered model for traceability. Users can submit training runs, register, and deploy models produced from MLflow runs.

If you want to deploy and register your production ready model in one step, see Deploy and register MLflow models.

To register and view a model from a run, use the following steps:

  1. Once the run is complete call the register_model() method.

    # the model folder produced from the run is registered. This includes the MLmodel file, model.pkl and the conda.yaml.
    run.register_model(model_name = 'my-model', model_path = 'model')
    
  2. View the registered model in your workspace with Azure Machine Learning studio.

    In the following example the registered model, my-model has MLflow tracking metadata tagged.

    register-mlflow-model

  3. Select the Artifacts tab to see all the model files that align with the MLflow model schema (conda.yaml, MLmodel, model.pkl).

    model-schema

  4. Select MLmodel to see the MLmodel file generated by the run.

    MLmodel-schema

Deploy and register MLflow models

Deploying your MLflow experiments as an Azure Machine Learning web service allows you to leverage and apply the Azure Machine Learning model management and data drift detection capabilities to your production models.

To do so, you need to

  1. Register your model.

  2. Determine which deployment configuration you want to use for your scenario.

    1. Azure Container Instance (ACI) is a suitable choice for a quick dev-test deployment.
    2. Azure Kubernetes Service (AKS) is suitable for scalable production deployments.

The following diagram demonstrates that with the MLflow deploy API you can deploy your existing MLflow models as an Azure Machine Learning web service, despite their frameworks--PyTorch, Tensorflow, scikit-learn, ONNX, etc., and manage your production models in your workspace.

 deploy mlflow models with azure machine learning

Deploy to ACI

Set up your deployment configuration with the deploy_configuration() method. You can also add tags and descriptions to help keep track of your web service.

from azureml.core.webservice import AciWebservice, Webservice

# Set the model path to the model folder created by your run
model_path = "model"

# Configure 
aci_config = AciWebservice.deploy_configuration(cpu_cores=1, 
                                                memory_gb=1, 
                                                tags={'method' : 'sklearn'}, 
                                                description='Diabetes model',
                                                location='eastus2')

Then, register and deploy the model in one step with the Azure Machine Learning SDK deploy method.

(webservice,model) = mlflow.azureml.deploy( model_uri='runs:/{}/{}'.format(run.id, model_path),
                      workspace=ws,
                      model_name='sklearn-model', 
                      service_name='diabetes-model-1', 
                      deployment_config=aci_config, 
                      tags=None, mlflow_home=None, synchronous=True)

webservice.wait_for_deployment(show_output=True)

Deploy to AKS

To deploy to AKS, first create an AKS cluster. Create an AKS cluster using the ComputeTarget.create() method. It may take 20-25 minutes to create a new cluster.

from azureml.core.compute import AksCompute, ComputeTarget

# Use the default configuration (can also provide parameters to customize)
prov_config = AksCompute.provisioning_configuration()

aks_name = 'aks-mlflow'

# Create the cluster
aks_target = ComputeTarget.create(workspace=ws, 
                                  name=aks_name, 
                                  provisioning_configuration=prov_config)

aks_target.wait_for_completion(show_output = True)

print(aks_target.provisioning_state)
print(aks_target.provisioning_errors)

Set up your deployment configuration with the deploy_configuration() method. You can also add tags and descriptions to help keep track of your web service.

from azureml.core.webservice import Webservice, AksWebservice

# Set the web service configuration (using default here with app insights)
aks_config = AksWebservice.deploy_configuration(enable_app_insights=True, compute_target_name='aks-mlflow')

Then, register and deploy the model in one step with the Azure Machine Learning SDK [deploy()](Then, register and deploy the model by using the Azure Machine Learning SDK deploy method.


# Webservice creation using single command
from azureml.core.webservice import AksWebservice, Webservice

# set the model path 
model_path = "model"

(webservice, model) = mlflow.azureml.deploy( model_uri='runs:/{}/{}'.format(run.id, model_path),
                      workspace=ws,
                      model_name='sklearn-model', 
                      service_name='my-aks', 
                      deployment_config=aks_config, 
                      tags=None, mlflow_home=None, synchronous=True)


webservice.wait_for_deployment()

The service deployment can take several minutes.

Clean up resources

If you don't plan to use the logged metrics and artifacts in your workspace, the ability to delete them individually is currently unavailable. Instead, delete the resource group that contains the storage account and workspace, so you don't incur any charges:

  1. In the Azure portal, select Resource groups on the far left.

    Delete in the Azure portal

  2. From the list, select the resource group you created.

  3. Select Delete resource group.

  4. Enter the resource group name. Then select Delete.

Example notebooks

The MLflow with Azure ML notebooks demonstrate and expand upon concepts presented in this article.

Next steps