Guidelines for deploying MLflow models

APPLIES TO: Azure CLI ml extension v2 (current)

In this article, learn about deployment of MLflow models to Azure Machine Learning for both real-time and batch inference. Learn also about the different tools you can use to manage the deployment.

Deployment of MLflow models vs. custom models

Unlike custom model deployment in Azure Machine Learning, when you deploy MLflow models to Azure Machine Learning, you don't have to provide a scoring script or an environment for deployment. Instead, Azure Machine Learning automatically generates the scoring script and environment for you. This functionality is called no-code deployment.

For no-code deployment, Azure Machine Learning:

  • Ensures that all the package dependencies indicated in the MLflow model are satisfied.
  • Provides an MLflow base image or curated environment that contains the following items:
    • Packages required for Azure Machine Learning to perform inference, including mlflow-skinny.
    • A scoring script to perform inference.

Tip

Workspaces without public network access: Before you can deploy MLflow models to online endpoints without egress connectivity, you have to package the models (preview). By using model packaging, you can avoid the need for an internet connection, which Azure Machine Learning would otherwise require to dynamically install necessary Python packages for the MLflow models.

Python packages and dependencies

Azure Machine Learning automatically generates environments to run inference on MLflow models. To build the environments, Azure Machine Learning reads the conda dependencies that are specified in the MLflow model and adds any packages that are required to run the inferencing server. These extra packages vary, depending on your deployment type.

The following conda.yaml file shows an example of conda dependencies specified in an MLflow model.

conda.yaml

channels:
- conda-forge
dependencies:
- python=3.10.11
- pip<=23.1.2
- pip:
  - mlflow==2.7.1
  - cloudpickle==1.6.0
  - dataclasses==0.6
  - lz4==4.0.0
  - numpy==1.23.5
  - packaging==23.0
  - psutil==5.9.0
  - pyyaml==6.0
  - scikit-learn==1.1.2
  - scipy==1.10.1
  - uuid==1.30
name: mlflow-env

Warning

MLflow automatically detects packages when logging a model and pins the package versions in the model's conda dependencies. However, this automatic package detection might not always reflect your intentions or requirements. In such cases, consider logging models with a custom conda dependencies definition.

Implications of using models with signatures

MLflow models can include a signature that indicates the expected inputs and their types. When such models are deployed to online or batch endpoints, Azure Machine Learning enforces that the number and types of the data inputs comply with the signature. If the input data can't be parsed as expected, the model invocation will fail.

You can inspect an MLflow model's signature by opening the MLmodel file associated with the model. For more information on how signatures work in MLflow, see Signatures in MLflow.

The following file shows the MLmodel file associated with an MLflow model.

MLmodel

artifact_path: model
flavors:
  python_function:
    env:
      conda: conda.yaml
      virtualenv: python_env.yaml
    loader_module: mlflow.sklearn
    model_path: model.pkl
    predict_fn: predict
    python_version: 3.10.11
  sklearn:
    code: null
    pickled_model: model.pkl
    serialization_format: cloudpickle
    sklearn_version: 1.1.2
mlflow_version: 2.7.1
model_uuid: 3f725f3264314c02808dd99d5e5b2781
run_id: 70f15bab-cf98-48f1-a2ea-9ad2108c28cd
signature:
  inputs: '[{"name": "age", "type": "double"}, {"name": "sex", "type": "double"},
    {"name": "bmi", "type": "double"}, {"name": "bp", "type": "double"}, {"name":
    "s1", "type": "double"}, {"name": "s2", "type": "double"}, {"name": "s3", "type":
    "double"}, {"name": "s4", "type": "double"}, {"name": "s5", "type": "double"},
    {"name": "s6", "type": "double"}]'
  outputs: '[{"type": "double"}]'

Tip

Signatures in MLflow models are optional but highly recommended, as they provide a convenient way to detect data compatibility issues early. For more information about how to log models with signatures, see Logging models with a custom signature, environment or samples.

Models deployed in Azure Machine Learning vs. models deployed in the MLflow built-in server

MLflow includes built-in deployment tools that model developers can use to test models locally. For instance, you can run a local instance of a model that is registered in the MLflow server registry, using mlflow models serve -m my_model or using the MLflow CLI mlflow models predict.

Inferencing with batch vs. online endpoints

Azure Machine Learning supports deploying models to both online and batch endpoints. These endpoints run different inferencing technologies that can have different features.

Online endpoints are similar to the MLflow built-in server in that they provide a scalable, synchronous, and lightweight way to run models for inference.

On the other hand, batch endpoints are capable of running asynchronous inference over long-running inferencing processes that can scale to large amounts of data. The MLflow server currently lacks this capability, although a similar capability can be achieved by using Spark jobs. To learn more about batch endpoints and MLflow models, see Use MLflow models in batch deployments.

The sections that follow focus more on MLflow models deployed to Azure Machine Learning online endpoints.

Input formats

Input type MLflow built-in server Azure Machine Learning Online Endpoints
JSON-serialized pandas DataFrames in the split orientation
JSON-serialized pandas DataFrames in the records orientation Deprecated
CSV-serialized pandas DataFrames Use batch1
Tensor input format as JSON-serialized lists (tensors) and dictionary of lists (named tensors)
Tensor input formatted as in TF Serving's API

1 Consider using batch inferencing to process files. For more information, see Deploy MLflow models to batch endpoints.

Input structure

Regardless of the input type used, Azure Machine Learning requires you to provide inputs in a JSON payload, within the dictionary key input_data. Because this key isn't required when using the command mlflow models serve to serve models, payloads can't be used interchangeably for Azure Machine Learning online endpoints and the MLflow built-in server.

Important

MLflow 2.0 advisory: Notice that the payload's structure changed in MLflow 2.0.

This section shows different payload examples and the differences for a model that is deployed in the MLflow built-in server versus the Azure Machine Learning inferencing server.

Payload example for a JSON-serialized pandas DataFrame in the split orientation

{
    "input_data": {
        "columns": [
            "age", "sex", "trestbps", "chol", "fbs", "restecg", "thalach", "exang", "oldpeak", "slope", "ca", "thal"
        ],
        "index": [1],
        "data": [
            [1, 1, 145, 233, 1, 2, 150, 0, 2.3, 3, 0, 2]
        ]
    }
}

Payload example for a tensor input

{
    "input_data": [
          [1, 1, 0, 233, 1, 2, 150, 0, 2.3, 3, 0, 2],
          [1, 1, 0, 233, 1, 2, 150, 0, 2.3, 3, 0, 2]
          [1, 1, 0, 233, 1, 2, 150, 0, 2.3, 3, 0, 2],
          [1, 1, 145, 233, 1, 2, 150, 0, 2.3, 3, 0, 2]
    ]
}

Payload example for a named-tensor input

{
    "input_data": {
        "tokens": [
          [0, 655, 85, 5, 23, 84, 23, 52, 856, 5, 23, 1]
        ],
        "mask": [
          [0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0]
        ]
    }
}

For more information about MLflow built-in deployment tools, see Built-in deployment tools in the MLflow documentation.

Customize inference when deploying MLflow models

You might be used to authoring scoring scripts to customize how inferencing is executed for your custom models. However, when deploying MLflow models to Azure Machine Learning, the decision about how inferencing should be executed is done by the model builder (the person who built the model), rather than by the DevOps engineer (the person who is trying to deploy it). Each model framework might automatically apply specific inference routines.

At any point, if you need to change how inference of an MLflow model is executed, you can do one of two things:

  • Change how your model is being logged in the training routine.
  • Customize inference with a scoring script at deployment time.

Change how your model is logged during training

When you log a model, using either mlflow.autolog or mlflow.<flavor>.log_model, the flavor used for the model decides how inference should be executed and what results the model returns. MLflow doesn't enforce any specific behavior for how the predict() function generates results.

In some cases, however, you might want to do some preprocessing or post-processing before and after your model is executed. At other times, you might want to change what is returned (for example, probabilities versus classes). One solution is to implement machine learning pipelines that move from inputs to outputs directly. For example, sklearn.pipeline.Pipeline or pyspark.ml.Pipeline are popular ways to implement pipelines, and are sometimes recommended for performance considerations. Another alternative is to customize how your model does inferencing, by using a custom model flavor.

Customize inference with a scoring script

Although MLflow models don't require a scoring script, you can still provide one, if needed. You can use the scoring script to customize how inference is executed for MLflow models. For more information on how to customize inference, see Customizing MLflow model deployments (online endpoints) and Customizing MLflow model deployments (batch endpoints).

Important

If you choose to specify a scoring script for an MLflow model deployment, you also need to provide an environment for the deployment.

Deployment tools

Azure Machine Learning offers many ways to deploy MLflow models to online and batch endpoints. You can deploy models, using the following tools:

  • MLflow SDK
  • Azure Machine Learning CLI
  • Azure Machine Learning SDK for Python
  • Azure Machine Learning studio

Each workflow has different capabilities, particularly around which type of compute they can target. The following table shows the different capabilities.

Scenario MLflow SDK Azure Machine Learning CLI/SDK Azure Machine Learning studio
Deploy to managed online endpoints See example1 See example1 See example1
Deploy to managed online endpoints (with a scoring script) Not supported3 See example See example
Deploy to batch endpoints Not supported3 See example See example
Deploy to batch endpoints (with a scoring script) Not supported3 See example See example
Deploy to web services (ACI/AKS) Legacy support2 Not supported2 Not supported2
Deploy to web services (ACI/AKS - with a scoring script) Not supported3 Legacy support2 Legacy support2

1 Deployment to online endpoints that are in workspaces with private link enabled requires you to package models before deployment (preview).

2 We recommend switching to managed online endpoints instead.

3 MLflow (OSS) doesn't have the concept of a scoring script and doesn't support batch execution currently.

Which deployment tool to use?

  • Use the MLflow SDK if both of these conditions apply:

    • You're familiar with MLflow, or you're using a platform that supports MLflow natively (like Azure Databricks).
    • You wish to continue using the same set of methods from MLflow.
  • Use the Azure Machine Learning CLI v2 if any of these conditions apply:

    • You're more familiar with the Azure Machine Learning CLI v2.
    • You want to automate deployments, using automation pipelines.
    • You want to keep deployment configuration in a git repository.
  • Use the Azure Machine Learning studio UI deployment if you want to quickly deploy and test models trained with MLflow.