Track Machine Learning Training Runs

You track source properties, parameters, metrics, tags, and artifacts related to training a machine learning model in an MLflow run. Each run records the following information:

  • Source: Name of the notebook that launched the run or the project name and entry point for the run.
  • Version: Notebook revision if run from a notebook or Git commit hash if run from an MLflow Project.
  • Start & end time: Start and end time of the run.
  • Parameters: Key-value model parameters. Both keys and values are strings.
  • Tags: Key-value run metadata that can be updated during and after a run completes. Both keys and values are strings.
  • Metrics: Key-value model evaluation metrics.The value is numeric. Each metric can be updated throughout the course of the run (for example, to track how your model’s loss function is converging), and MLflow records and lets you visualize the metric’s history.
  • Artifacts: Output files in any format. For example, you can record images, models (for example, a pickled scikit-learn model), and data files (for example, a Parquet file) as an artifact.

An MLflow experiment is the primary unit of organization and access control for MLflow runs; all MLflow runs belong to an experiment. Each experiment lets you visualize, search, and compare runs, as well as download run artifacts or metadata for analysis in other tools. The experiment UI lets you perform the following key tasks:

  • List and compare runs
  • Search for runs by parameter or metric value
  • Visualize run metrics
  • Download run results

You start runs and log parameters, metrics, tags, and artifacts using the MLflow Tracking API. The Tracking API communicates with an MLflow tracking server. When you log run data in Azure Databricks, the data is handled by an Azure Databricks hosted tracking server. Tracking to the hosted MLflow tracking server requires Databricks Runtime >= 5.0 and is supported in Python, Java, and R.

Experiments

Experiments are located in the Workspace file tree. An experiment’s name is the same as its workspace path. If you create an experiment using the mlflow.set_experiment(experiment_name) API, Azure Databricks saves the experiment based on the name you give it.

You can control who can view, edit, and manage experiments by enabling Workspace access control.

Create an experiment

  1. Click the Workspace button Workspace Icon or the Home button Home Icon in the sidebar. Do one of the following:

    • Next to any folder, click the Menu Dropdown on the right side of the text and select Create > Experiment.

      no-alternative-text

    • In the Workspace or a user folder, click Down Caret and select Create > Experiment.

  2. In the Create Experiment dialog, enter a fully-qualified path in the Workspace and an optional artifact location. If you do not specify an artifact location, artifacts are stored in dbfs:/databricks/mlflow/<experiment-id>.

    Azure Databricks supports DBFS and Azure Blob storage artifact locations.

    To store artifacts in Azure Blob storage, specify a URI of the form wasbs://<container>@<storage-account>.blob.core.windows.net/<path>. Artifacts stored in Azure Blob storage cannot be viewed in the MLflow UI; you must download them using a blob storage client.

  3. Click Create. An empty experiment displays.

    no-alternative-text

Display an experiment

  1. Click the Workspace button Workspace Icon or the Home button Home Icon in the sidebar.
  2. Navigate to a folder containing an experiment.
  3. Click the experiment name.

Delete an experiment

  1. Click the Workspace button Workspace Icon or the Home button Home Icon in the sidebar.
  2. Navigate to a folder containing an experiment.
  3. Click the Menu Dropdown at the right side of the experiment and select Move to Trash.

Notebook experiments

Every Python and R notebook in an Azure Databricks workspace has its own experiment. When you use MLflow in a notebook, it records runs in the notebook experiment.

A notebook experiment shares the same name and ID as its corresponding notebook. The notebook ID is the numerical identifier at the end of a Notebook URL.

Note

If you delete a notebook experiment using the API (for example, MlflowClient.tracking.delete_experiment() in Python), the notebook itself is moved into the Trash folder.

The MLflow Python API and R API can automatically detect the notebook experiment when you create a run.

View notebook experiments and runs

To view the MLflow runs associated with a notebook, click the MLflow Runs Link Icon icon in the notebook context bar:

no-alternative-text

From the Runs sidebar, you can view the run parameters and metrics:

no-alternative-text

Click the External Link icon in the Runs context bar to view the experiment:

no-alternative-text

In the Runs sidebar, click the date link Run Date

to view a run:

no-alternative-text

Access a tracking server from outside Azure Databricks

Start run and record run data

You can start runs and record run data in Python, Java or Scala, and R. The following sections summarize the steps. For example notebooks, see the Quick Start.

Python

  1. Install the PyPI library mlflow[extras] to a cluster, where the extra dependencies are:

    • scikit-learn when Python version >= ‘3.5’
    • scikit-learn == 0.20 when Python version < ‘3.5’
    • boto3 >= 1.7.12
    • mleap >= 0.8.1
    • azure-storage
    • google-cloud-storage
  2. Import MLflow library:

    import mlflow
    
  3. Start an MLflow run:

    with mlflow.start_run() as run:
    
  4. Log parameters, metrics, and artifacts:

    # Log a parameter (key-value pair)
    mlflow.log_param("param1", 5)
    
    # Log a metric; metrics can be updated throughout the run
    mlflow.log_metric("foo", 2, step=1)
    mlflow.log_metric("foo", 4, step=2)
    mlflow.log_metric("foo", 6, step=3)
    
    # Log an artifact (output file)
    with open("output.txt", "w") as f:
        f.write("Hello world!")
    mlflow.log_artifact("output.txt")
    

Scala

  1. Install the PyPI library mlflow and the Maven library org.mlflow:mlflow-client:1.0.0 to a cluster.

  2. Import MLflow and file libraries:

    import org.mlflow.tracking.ActiveRun
    import org.mlflow.tracking.MlflowContext
    import java.io.{File,PrintWriter}
    
  3. Create MLflow context:

    val mlflowContext = new MlflowContext()
    
  4. Create an experiment.

    val experimentName = "/Shared/QuickStart"
    val mlflowContext = new MlflowContext()
    val client = mlflowContext.getClient()
    val experimentOpt = client.getExperimentByName(experimentName);
    if (!experimentOpt.isPresent()) {
     client.createExperiment(experimentName)
    }
    mlflowContext.setExperimentName(experimentName)
    
  5. Log parameters, metrics, and file:

    import java.nio.file.Paths
    val run = mlflowContext.startRun("run")
    // Log a parameter (key-value pair)
    run.logParam("param1", "5")
    
    // Log a metric; metrics can be updated throughout the run
    run.logMetric("foo", 2.0, 1)
    run.logMetric("foo", 4.0, 2)
    run.logMetric("foo", 6.0, 3)
    
     new PrintWriter("/tmp/output.txt") { write("Hello, world!") ; close }
     run.logArtifact(Paths.get("/tmp/output.txt"))
    
  6. Close the run:

    run.endRun()
    

R

  1. Install the CRAN library mlflow to a cluster.

  2. Import and install MLflow libraries:

    library(mlflow)
    install_mlflow()
    
  3. Create a new run:

    run <- mlflow_start_run()
    
  4. Log parameters, metrics, and file:

    # Log a parameter (key-value pair)
    mlflow_log_param("param1", 5)
    # Log a metric; metrics can be updated throughout the run
    mlflow_log_metric("foo", 2, step = 1)
    mlflow_log_metric("foo", 4, step = 2)
    mlflow_log_metric("foo", 6, step = 3)
    # Log an artifact (output file)
    write("Hello world!", file = "output.txt")
    mlflow_log_artifact("output.txt")
    
  5. Close the run:

    mlflow_end_run()
    

View and manage runs in experiments

Within an experiment you can perform many operations on its contained runs.

Filter runs

To filter runs by a parameter or metric name, type the parameter or metric name in the Filter [Params|Metric] field and press Enter.

To filter runs that match an expression containing parameter and metric values:

  1. In the Search Runs field, specify an expression. For example: 'metrics.r2 > 0.3'.

    no-alternative-text

  2. Click Search.

Download runs

  1. Select one or more runs.
  2. Click Download CSV. A CSV file containing the following fields downloads: Run ID,Name,Source Type,Source Name,User,Status,<parameter1>,<parameter2>,...,<metric1>,<metric2>,....

Display run details

Click the date link of a run. The run details screen displays. The fields in the detail page depend on whether you ran from a notebook or a Git project.

Notebook

If the run was launched locally in an Azure Databricks notebook or job, it looks like:

no-alternative-text

The link in the Source field opens the specific notebook version used in the run.

no-alternative-text

Git project

If the run was launched remotely from a Git project, it looks like:

no-alternative-text

The link in the Source field opens the master branch of the Git project used in the run. The link in the Git Commit field opens the specific version of the project used in the run.

Compare runs

  1. Select two or more runs.

  2. Click Compare. Either select a metric name to display a graph of the metric or select parameters and metrics from the X-axis and Y-axis drop-down lists to generate a scatter plot.

    no-alternative-text

    The Comparing Runs screen displays. For example, here is a scatter plot. At the top right, the scatter plot has a number of controls for manipulating the plot.

    no-alternative-text

Delete runs

  1. Select the checkbox at the far left of one or more runs.

  2. Click Delete.

    no-alternative-text

After you delete a run you can still display it by selecting Deleted in the State field.

Analyze MLflow runs using DataFrames

You can access MLflow run data programmatically using the following two DataFrame APIs:

Examples

The following notebooks demonstrate how to train several types of models and track the training data in MLflow and how to store tracking data in Delta Lake.