Quick start Python

MLflow is an open source platform for managing the end-to-end machine learning lifecycle. MLflow provides simple APIs for logging metrics (for example, model loss), parameters (for example, learning rate), and fitted models, making it easy to analyze training results or deploy models later on.

In this section:

Install MLflow

If you’re using Databricks Runtime for Machine Learning, MLflow is already installed. Otherwise, install the MLflow package from PyPI.

Automatically log training runs to MLflow

MLflow provides mlflow.<framework>.autolog() APIs to automatically log training code written in many ML frameworks. You can call this API before running training code to log model-specific metrics, parameters, and model artifacts.

Tensorflow

# Also autoinstruments tf.keras
import mlflow.tensorflow
mlflow.tensorflow.autolog()

Keras

# Use import mlflow.tensorflow and mlflow.tensorflow.autolog() if using tf.keras
import mlflow.keras
mlflow.keras.autolog()

Xgboost

import mlflow.xgboost
mlflow.xgboost.autolog()

Lightgbm

import mlflow.lightgbm
mlflow.lightgbm.autolog()

Scikit-learn

import mlflow.sklearn
mlflow.sklearn.autolog()

Pyspark

If performing tuning with pyspark.ml, metrics and models are automatically logged to MLflow. See Apache Spark MLlib and automated MLflow tracking

View results

After executing your machine learning code, you can view results using the Experiment Runs sidebar. See View notebook experiment for instructions on how to view the experiment, run, and notebook revision used in the quick start.

Track additional metrics, parameters, and models

You can log additional information by directly invoking the MLflow Tracking logging APIs.

  • Numerical metrics:

    import mlflow
    mlflow.log_metric("accuracy", 0.9)
    
  • Training parameters:

    import mlflow
    mlflow.log_param("learning_rate", 0.001)
    
  • Models:

    Scikit-learn

    import mlflow.sklearn
    mlflow.sklearn.log_model(model, "myModel")
    

    Pyspark

    import mlflow.spark
    mlflow.spark.log_model(model, "myModel")
    

    Xgboost

    import mlflow.xgboost
    mlflow.xgboost.log_model(model, "myModel")
    

    Tensorflow

    import mlflow.tensorflow
    mlflow.tensorflow.log_model(model, "myModel")
    

    Keras

    import mlflow.keras
    mlflow.keras.log_model(model, "myModel")
    

    Pytorch

    import mlflow.pytorch
    mlflow.pytorch.log_model(model, "myModel")
    

    Spacy

    import mlflow.spacy
    mlflow.spacy.log_model(model, "myModel")
    
  • Other artifacts (files):

    import mlflow
    mlflow.log_artifact("/tmp/my-file", "myArtifactPath")
    

Example notebooks

Requirements

Databricks Runtime 6.4 or above or Databricks Runtime 6.4 ML or above.

Notebooks

The recommended way to get started using MLflow tracking with Python is to use the MLflow autolog() API. With MLflow’s autologging capabilities, a single line of code automatically logs the resulting model, the parameters used to create the model, and a model score. The following notebook shows you how to set up a run using autologging.

MLflow Autologging Quick Start Python notebook

Get notebook

If you need more control over the metrics logged for each training run, or want to log additional artifacts such as tables or plots, you can use the MLflow logging API functions demonstrated in the following notebook.

MLflow Logging API Quick Start Python notebook

Get notebook

Learn more