How to operationalize a training pipeline with batch endpoints

Article
12/20/2023

APPLIES TO: Azure CLI ml extension v2 (current) Python SDK azure-ai-ml v2 (current)

In this article, you'll learn how to operationalize a training pipeline under a batch endpoint. The pipeline uses multiple components (or steps) that include model training, data preprocessing, and model evaluation.

You'll learn to:

Create and test a training pipeline
Deploy the pipeline to a batch endpoint
Modify the pipeline and create a new deployment in the same endpoint
Test the new deployment and set it as the default deployment

About this example

This example deploys a training pipeline that takes input training data (labeled) and produces a predictive model, along with the evaluation results and the transformations applied during preprocessing. The pipeline will use tabular data from the UCI Heart Disease Data Set to train an XGBoost model. We use a data preprocessing component to preprocess the data before it is sent to the training component to fit and evaluate the model.

A visualization of the pipeline is as follows:

The example in this article is based on code samples contained in the azureml-examples repository. To run the commands locally without having to copy/paste YAML and other files, first clone the repo and then change directories to the folder:

Azure CLI
Python

git clone https://github.com/Azure/azureml-examples --depth 1
cd azureml-examples/cli

git clone https://github.com/Azure/azureml-examples --depth 1
cd azureml-examples/sdk/python

The files for this example are in:

cd endpoints/batch/deploy-pipelines/training-with-components

Follow along in Jupyter notebooks

You can follow along with the Python SDK version of this example by opening the sdk-deploy-and-test.ipynb notebook in the cloned repository.

Prerequisites

Before following the steps in this article, make sure you have the following prerequisites:

An Azure subscription. If you don't have an Azure subscription, create a free account before you begin. Try the free or paid version of Azure Machine Learning.
An Azure Machine Learning workspace. If you don't have one, use the steps in the Manage Azure Machine Learning workspaces article to create one.
Ensure that you have the following permissions in the workspace:
- Create or manage batch endpoints and deployments: Use an Owner, Contributor, or Custom role that allows Microsoft.MachineLearningServices/workspaces/batchEndpoints/*.
- Create ARM deployments in the workspace resource group: Use an Owner, Contributor, or Custom role that allows Microsoft.Resources/deployments/write in the resource group where the workspace is deployed.
You need to install the following software to work with Azure Machine Learning:
- Azure CLI
- Python
The Azure CLI and the ml extension for Azure Machine Learning.
```
az extension add -n ml
```
Note

Pipeline component deployments for Batch Endpoints were introduced in version 2.7 of the ml extension for Azure CLI. Use az extension update --name ml to get the last version of it.
The Azure Machine Learning SDK for Python.
```
pip install azure-ai-ml
```
Note

Classes ModelBatchDeployment and PipelineComponentBatchDeployment were introduced in version 1.7.0 of the SDK. Use pip install -U azure-ai-ml to get the last version of it.

Connect to your workspace

The workspace is the top-level resource for Azure Machine Learning, providing a centralized place to work with all the artifacts you create when you use Azure Machine Learning. In this section, we'll connect to the workspace in which you'll perform deployment tasks.

Azure CLI
Python

Pass in the values for your subscription ID, workspace, location, and resource group in the following code:

az account set --subscription <subscription>
az configure --defaults workspace=<workspace> group=<resource-group> location=<location>

Import the required libraries:

from azure.ai.ml import MLClient, Input, load_component
from azure.ai.ml.entities import BatchEndpoint, ModelBatchDeployment, ModelBatchDeploymentSettings, PipelineComponentBatchDeployment, Model, AmlCompute, Data, BatchRetrySettings, CodeConfiguration, Environment, Data
from azure.ai.ml.constants import AssetTypes, BatchDeploymentOutputAction
from azure.ai.ml.dsl import pipeline
from azure.identity import DefaultAzureCredential

Configure the workspace details and get a handle to the workspace:

Pass in the values for your subscription ID, workspace, and resource group in the following code:

subscription_id = "<subscription>"
resource_group = "<resource-group>"
workspace = "<workspace>"

ml_client = MLClient(DefaultAzureCredential(), subscription_id, resource_group, workspace)

Create the training pipeline component

In this section, we'll create all the assets required for our training pipeline. We'll begin by creating an environment that includes necessary libraries to train the model. We'll then create a compute cluster on which the batch deployment will run, and finally, we'll register the input data as a data asset.

Create the environment

The components in this example will use an environment with the XGBoost and scikit-learn libraries. The environment/conda.yml file contains the environment's configuration:

environment/conda.yml

channels:
- conda-forge
dependencies:
- python=3.8.5
- pip
- pip:
  - mlflow
  - azureml-mlflow
  - datasets
  - jobtools
  - cloudpickle==1.6.0
  - dask==2023.2.0
  - scikit-learn==1.1.2
  - xgboost==1.3.3
  - pandas==1.4
name: mlflow-env

Create the environment as follows:

Define the environment:

Azure CLI
Python

environment/xgboost-sklearn-py38.yml

$schema: https://azuremlschemas.azureedge.net/latest/environment.schema.json
name: xgboost-sklearn-py38
image: mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04:latest
conda_file: conda.yml
description: An environment for models built with XGBoost and Scikit-learn.

environment = Environment(
    name="xgboost-sklearn-py38",
    description="An environment for models built with XGBoost and Scikit-learn.",
    image="mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04:latest",
    conda_file="environment/conda.yml",
)

Create the environment:

Azure CLI
Python

az ml environment create -f environment/xgboost-sklearn-py38.yml

try:
    ml_client.environments.create_or_update(environment)
except ResourceExistsError:
    pass

Create a compute cluster

Batch endpoints and deployments run on compute clusters. They can run on any Azure Machine Learning compute cluster that already exists in the workspace. Therefore, multiple batch deployments can share the same compute infrastructure. In this example, we'll work on an Azure Machine Learning compute cluster called batch-cluster. Let's verify that the compute exists on the workspace or create it otherwise.

Azure CLI
Python

az ml compute create -n batch-cluster --type amlcompute --min-instances 0 --max-instances 5

compute_name = "batch-cluster"
if not any(filter(lambda m: m.name == compute_name, ml_client.compute.list())):
    compute_cluster = AmlCompute(
        name=compute_name,
        description="Batch endpoints compute cluster",
        min_instances=0,
        max_instances=5,
    )
    ml_client.begin_create_or_update(compute_cluster).result()

Register the training data as a data asset

Our training data is represented in CSV files. To mimic a more production-level workload, we're going to register the training data in the heart.csv file as a data asset in the workspace. This data asset will later be indicated as an input to the endpoint.

Azure CLI
Python

az ml data create --name heart-classifier-train --type uri_folder --path data/train

data_path = "data/train"
dataset_name = "heart-dataset-train"

heart_dataset_train = Data(
    path=data_path,
    type=AssetTypes.URI_FOLDER,
    description="A training dataset for heart classification",
    name=dataset_name,
)

Create the data asset:

ml_client.data.create_or_update(heart_dataset_train)

Let's get a reference to the new data asset:

heart_dataset_train = ml_client.data.get(name=dataset_name, label="latest")

Create the pipeline

The pipeline we want to operationalize takes one input, the training data, and produces three outputs: the trained model, the evaluation results, and the data transformations applied as preprocessing. The pipeline consists of two components:

preprocess_job: This step reads the input data and returns the prepared data and the applied transformations. The step receives three inputs:
- data: a folder containing the input data to transform and score
- transformations: (optional) Path to the transformations that will be applied, if available. If the path isn't provided, then the transformations will be learned from the input data. Since the transformations input is optional, the preprocess_job component can be used during training and scoring.
- categorical_encoding: the encoding strategy for the categorical features (ordinal or onehot).
train_job: This step will train an XGBoost model based on the prepared data and return the evaluation results and the trained model. The step receives three inputs:
- data: the preprocessed data.
- target_column: the column that we want to predict.
- eval_size: indicates the proportion of the input data used for evaluation.

Azure CLI
Python

The pipeline configuration is defined in the deployment-ordinal/pipeline.yml file:

deployment-ordinal/pipeline.yml

$schema: https://azuremlschemas.azureedge.net/latest/pipelineComponent.schema.json
type: pipeline

name: uci-heart-train-pipeline
display_name: uci-heart-train
description: This pipeline demonstrates how to train a machine learning classifier over the UCI heart dataset.

inputs:
  input_data:
    type: uri_folder

outputs: 
  model:
    type: mlflow_model
    mode: upload
  evaluation_results:
    type: uri_folder
    mode: upload
  prepare_transformations:
    type: uri_folder
    mode: upload

jobs:
  preprocess_job:
    type: command
    component: ../components/prepare/prepare.yml
    inputs:
      data: ${{parent.inputs.input_data}}
      categorical_encoding: ordinal
    outputs:
      prepared_data:
      transformations_output: ${{parent.outputs.prepare_transformations}}
  
  train_job:
    type: command
    component: ../components/train_xgb/train_xgb.yml
    inputs:
      data: ${{parent.jobs.preprocess_job.outputs.prepared_data}}
      target_column: target
      register_best_model: false
      eval_size: 0.3
    outputs:
      model: 
        mode: upload
        type: mlflow_model
        path: ${{parent.outputs.model}}
      evaluation_results:
        mode: upload
        type: uri_folder
        path: ${{parent.outputs.evaluation_results}}

Note

In the pipeline.yml file, the transformations input is missing from the preprocess_job; therefore, the script will learn the transformation parameters from the input data.

The configurations for the pipeline components are in the prepare.yml and train_xgb.yml files. Load the components:

prepare_data = load_component(source="components/prepare/prepare.yml")
train_xgb = load_component(source="components/train_xgb/train_xgb.yml")

Construct the pipeline:

@pipeline()
def uci_heart_classifier_trainer(input_data: Input(type=AssetTypes.URI_FOLDER)):
    prepared_data = prepare_data(data=input_data)
    trained_model = train_xgb(
        data=prepared_data.outputs.prepared_data,
        target_column="target",
        register_best_model=False,
        eval_size=0.3,
    )

    return {
        "model": trained_model.outputs.model,
        "evaluation_results": trained_model.outputs.evaluation_results,
        "transformations_output": prepared_data.outputs.transformations_output,
    }

Note

In the pipeline, the transformations input is missing; therefore, the script will learn the parameters from the input data.

A visualization of the pipeline is as follows:

Test the pipeline

Let's test the pipeline with some sample data. To do that, we'll create a job using the pipeline and the batch-cluster compute cluster created previously.

Azure CLI
Python

The following pipeline-job.yml file contains the configuration for the pipeline job:

deployment-ordinal/pipeline-job.yml

$schema: https://azuremlschemas.azureedge.net/latest/pipelineJob.schema.json
type: pipeline

experiment_name: uci-heart-train-pipeline
display_name: uci-heart-train-job
description: This pipeline demonstrates how to train a machine learning classifier over the UCI heart dataset.

compute: batch-cluster
component: pipeline.yml
inputs:
  input_data:
    type: uri_folder
outputs: 
  model:
    type: mlflow_model
    mode: upload
  evaluation_results:
    type: uri_folder
    mode: upload
  prepare_transformations:
    mode: upload

pipeline_job = uci_heart_classifier_trainer(
    Input(type="uri_folder", path=heart_dataset_train.id)
)

Now, we'll configure some run settings to run the test:

pipeline_job.settings.default_datastore = "workspaceblobstore"
pipeline_job.settings.default_compute = "batch-cluster"

Create the test job:

Azure CLI
Python

az ml job create -f deployment-ordinal/pipeline-job.yml --set inputs.input_data.path=azureml:heart-classifier-train@latest

pipeline_job_run = ml_client.jobs.create_or_update(
    pipeline_job, experiment_name="uci-heart-train-pipeline"
)
pipeline_job_run

Create a batch endpoint

Provide a name for the endpoint. A batch endpoint's name needs to be unique in each region since the name is used to construct the invocation URI. To ensure uniqueness, append any trailing characters to the name specified in the following code.
- Azure CLI
- Python
```
ENDPOINT_NAME="uci-classifier-train"
```
```
endpoint_name = "uci-classifier-train"
```

Configure the endpoint:

Azure CLI
Python

The endpoint.yml file contains the endpoint's configuration.

endpoint.yml

$schema: https://azuremlschemas.azureedge.net/latest/batchEndpoint.schema.json
name: uci-classifier-train
description: An endpoint to perform training of the Heart Disease Data Set prediction task.
auth_mode: aad_token

endpoint = BatchEndpoint(
    name=endpoint_name,
    description="An endpoint to perform training of the Heart Disease Data Set prediction task",
)

Create the endpoint:

Azure CLI
Python

az ml batch-endpoint create --name $ENDPOINT_NAME -f endpoint.yml

ml_client.batch_endpoints.begin_create_or_update(endpoint).result()

Query the endpoint URI:

Azure CLI
Python

az ml batch-endpoint show --name $ENDPOINT_NAME

endpoint = ml_client.batch_endpoints.get(name=endpoint_name)
print(endpoint)

Deploy the pipeline component

To deploy the pipeline component, we have to create a batch deployment. A deployment is a set of resources required for hosting the asset that does the actual work.

Configure the deployment:

Azure CLI
Python

The deployment-ordinal/deployment.yml file contains the deployment's configuration. You can check the full batch endpoint YAML schema for extra properties.

deployment-ordinal/deployment.yml

$schema: https://azuremlschemas.azureedge.net/latest/pipelineComponentBatchDeployment.schema.json
name: uci-classifier-train-xgb
description: A sample deployment that trains an XGBoost model for the UCI dataset.
endpoint_name: uci-classifier-train
type: pipeline
component: pipeline.yml
settings:
    continue_on_step_failure: false
    default_compute: batch-cluster

Our pipeline is defined in a function. To transform it to a component, you'll use the component property from it. Pipeline components are reusable compute graphs that can be included in batch deployments or used to compose more complex pipelines.

pipeline_component = ml_client.components.create_or_update(
    uci_heart_classifier_trainer().component
)

Now we can define the deployment:

deployment = PipelineComponentBatchDeployment(
    name="uci-classifier-train-xgb",
    description="A sample deployment that trains an XGBoost model for the UCI dataset.",
    endpoint_name=endpoint.name,
    component=pipeline_component,
    settings={"continue_on_step_failure": False, "default_compute": compute_name},
)

Create the deployment:
- Azure CLI
- Python
Run the following code to create a batch deployment under the batch endpoint and set it as the default deployment.
```
az ml batch-deployment create --endpoint $ENDPOINT_NAME -f deployment-ordinal/deployment.yml --set-default
```
Tip

Notice the use of the --set-default flag to indicate that this new deployment is now the default.
This command will start the deployment creation and return a confirmation response while the deployment creation continues.
```
ml_client.batch_deployments.begin_create_or_update(deployment).result()
```
Once created, let's configure this new deployment as the default one:
```
endpoint = ml_client.batch_endpoints.get(endpoint_name)
endpoint.defaults.deployment_name = deployment.name
ml_client.batch_endpoints.begin_create_or_update(endpoint).result()
```
Your deployment is ready for use.

Test the deployment

Once the deployment is created, it's ready to receive jobs. Follow these steps to test it:

Our deployment requires that we indicate one data input.
- Azure CLI
- Python
The inputs.yml file contains the definition for the input data asset:

inputs.yml
```
inputs:
  input_data:
    type: uri_folder
    path: azureml:heart-classifier-train@latest
```
Define the input data asset:
```
input_data = Input(type=AssetTypes.URI_FOLDER, path=heart_dataset_train.id)
```
Tip

To learn more about how to indicate inputs, see Create jobs and input data for batch endpoints.
You can invoke the default deployment as follows:
- Azure CLI
- Python
```
JOB_NAME=$(az ml batch-endpoint invoke -n $ENDPOINT_NAME --f inputs.yml --query name -o tsv)
```
Tip

What's the difference between inputs and input when you invoke an endpoint?

In general, you can use a dictionary inputs = {} with the invoke method to provide an arbitrary number of required inputs to a batch endpoint that contains a model deployment or a pipeline deployment.

For a model deployment, you can use input as a shorter way to specify the input data location for the deployment, since a model deployment always takes only one data input.
```
job = ml_client.batch_endpoints.invoke(
    endpoint_name=endpoint.name, inputs={"input_data": input_data}
)
```
You can monitor the progress of the show and stream the logs using:
- Azure CLI
- Python
```
az ml job stream -n $JOB_NAME
```
```
ml_client.jobs.get(job.name)
```
To wait for the job to finish, run the following code:
```
ml_client.jobs.stream(name=job.name)
```

It's worth mentioning that only the pipeline's inputs are published as inputs in the batch endpoint. For instance, categorical_encoding is an input of a step of the pipeline, but not an input in the pipeline itself. Use this fact to control which inputs you want to expose to your clients and which ones you want to hide.

Access job outputs

Once the job is completed, we can access some of its outputs. This pipeline produces the following outputs for its components:

preprocess job: output is transformations_output
train job: outputs are model and evaluation_results

You can download the associated results using:

Azure CLI
Python

az ml job download --name $JOB_NAME --output-name transformations
az ml job download --name $JOB_NAME --output-name model
az ml job download --name $JOB_NAME --output-name evaluation_results

ml_client.jobs.download(
    name=job.name, download_path=".", output_name="transformations_output"
)
ml_client.jobs.download(name=job.name, download_path=".", output_name="model")
ml_client.jobs.download(
    name=job.name, download_path=".", output_name="evaluation_results"
)

Create a new deployment in the endpoint

Endpoints can host multiple deployments at once, while keeping only one deployment as the default. Therefore, you can iterate over your different models, deploy the different models to your endpoint and test them, and finally, switch the default deployment to the model deployment that works best for you.

Let's change the way preprocessing is done in the pipeline to see if we get a model that performs better.

Change a parameter in the pipeline's preprocessing component

The preprocessing component has an input called categorical_encoding, which can have values ordinal or onehot. These values correspond to two different ways of encoding categorical features.

ordinal: Encodes the feature values with numeric values (ordinal) from [1:n], where n is the number of categories in the feature. Ordinal encoding implies that there's a natural rank order among the feature categories.
onehot: Doesn't imply a natural rank ordered relationship but introduces a dimensionality problem if the number of categories is large.

By default, we used ordinal previously. Let's now change the categorical encoding to use onehot and see how the model performs.

Tip

Alternatively, we could have exposed the categorial_encoding input to clients as an input to the pipeline job itself. However, we chose to change the parameter value in the preprocessing step so that we can hide and control the parameter inside of the deployment and take advantage of the opportunity to have multiple deployments under the same endpoint.

Modify the pipeline. It looks as follows:

Azure CLI
Python

The pipeline configuration is defined in the deployment-onehot/pipeline.yml file:

deployment-onehot/pipeline.yml

$schema: https://azuremlschemas.azureedge.net/latest/pipelineComponent.schema.json
type: pipeline

name: uci-heart-train-pipeline
display_name: uci-heart-train
description: This pipeline demonstrates how to train a machine learning classifier over the UCI heart dataset.

inputs:
  input_data:
    type: uri_folder

outputs: 
  model:
    type: mlflow_model
    mode: upload
  evaluation_results:
    type: uri_folder
    mode: upload
  prepare_transformations:
    type: uri_folder
    mode: upload

jobs:
  preprocess_job:
    type: command
    component: ../components/prepare/prepare.yml
    inputs:
      data: ${{parent.inputs.input_data}}
      categorical_encoding: onehot
    outputs:
      prepared_data:
      transformations_output: ${{parent.outputs.prepare_transformations}}
  
  train_job:
    type: command
    component: ../components/train_xgb/train_xgb.yml
    inputs:
      data: ${{parent.jobs.preprocess_job.outputs.prepared_data}}
      target_column: target
      eval_size: 0.3
    outputs:
      model: 
        type: mlflow_model
        path: ${{parent.outputs.model}}
      evaluation_results:
        type: uri_folder
        path: ${{parent.outputs.evaluation_results}}

@pipeline()
def uci_heart_classifier_onehot(input_data: Input(type=AssetTypes.URI_FOLDER)):
    prepared_data = prepare_data(data=input_data, categorical_encoding="onehot")
    trained_model = train_xgb(
        data=prepared_data.outputs.prepared_data,
        target_column="target",
        register_best_model=False,
        eval_size=0.3,
    )

    return {
        "model": trained_model.outputs.model,
        "evaluation_results": trained_model.outputs.evaluation_results,
        "transformations_output": prepared_data.outputs.transformations_output,
    }

Configure the deployment:

Azure CLI
Python

The deployment-onehot/deployment.yml file contains the deployment's configuration. You can check the full batch endpoint YAML schema for extra properties.

deployment-onehot/deployment.yml

$schema: https://azuremlschemas.azureedge.net/latest/pipelineComponentBatchDeployment.schema.json
name: uci-classifier-train-onehot
description: A sample deployment that trains an XGBoost model for the UCI dataset using onehot encoding for variables.
endpoint_name: uci-classifier-train
type: pipeline
component: pipeline.yml
settings:
    continue_on_step_failure: false
    default_compute: batch-cluster

Our pipeline is defined in a function. To transform it to a component, you'll use the build() method. Pipeline components are reusable compute graphs that can be included in batch deployments or used to compose more complex pipelines.

pipeline_component = uci_heart_classifier_onehot._pipeline_builder.build()

Now we can define the deployment:

deployment_onehot = PipelineComponentBatchDeployment(
    name="uci-classifier-train-onehot",
    description="A sample deployment that trains an XGBoost model for the UCI dataset with one hot encoding of categorical variables.",
    endpoint_name=endpoint.name,
    component=pipeline_component,
    settings={"continue_on_step_failure": False, "default_compute": compute_name},
)

Create the deployment:
- Azure CLI
- Python
Run the following code to create a batch deployment under the batch endpoint and set it as the default deployment.
```
az ml batch-deployment create --endpoint $ENDPOINT_NAME -f deployment-onehot/deployment.yml
```
Your deployment is ready for use.
This command will start the deployment creation and return a confirmation response while the deployment creation continues.
```
ml_client.batch_deployments.begin_create_or_update(deployment_onehot).result()
```
Your deployment is ready for use.

Test a nondefault deployment

Once the deployment is created, it's ready to receive jobs. We can test it in the same way we did before, but now we'll invoke a specific deployment:

Invoke the deployment as follows, specifying the deployment parameter to trigger the specific deployment uci-classifier-train-onehot:
- Azure CLI
- Python
```
DEPLOYMENT_NAME="uci-classifier-train-onehot"
JOB_NAME=$(az ml batch-endpoint invoke -n $ENDPOINT_NAME -d $DEPLOYMENT_NAME --f inputs.yml --query name -o tsv)
```
Tip

What's the difference between inputs and input when you invoke an endpoint?

In general, you can use a dictionary inputs = {} with the invoke method to provide an arbitrary number of required inputs to a batch endpoint that contains a model deployment or a pipeline deployment.

For a model deployment, you can use input as a shorter way to specify the input data location for the deployment, since a model deployment always takes only one data input.
```
job = ml_client.batch_endpoints.invoke(
    endpoint_name=endpoint.name,
    deployment_name=deployment_onehot.name,
    inputs={"input_data": input_data},
)
```
You can monitor the progress of the show and stream the logs using:
- Azure CLI
- Python
```
az ml job stream -n $JOB_NAME
```
```
ml_client.jobs.get(name=job.name)
```
To wait for the job to finish, run the following code:
```
ml_client.jobs.stream(name=job.name)
```

Configure the new deployment as the default one

Once we're satisfied with the performance of the new deployment, we can set this new one as the default:

Azure CLI
Python

az ml batch-endpoint update --name $ENDPOINT_NAME --set defaults.deployment_name=$DEPLOYMENT_NAME

endpoint = ml_client.batch_endpoints.get(endpoint_name)
endpoint.defaults.deployment_name = deployment.name
ml_client.batch_endpoints.begin_create_or_update(endpoint).result()

Delete the old deployment

Once you're done, you can delete the old deployment if you don't need it anymore:

Azure CLI
Python

az ml batch-deployment delete --name uci-classifier-train-xgb --endpoint-name $ENDPOINT_NAME --yes

ml_client.batch_deployments.begin_delete(
    name=deployment.name, endpoint_name=endpoint.name
).result()

Clean up resources

Once you're done, delete the associated resources from the workspace:

Azure CLI
Python

Run the following code to delete the batch endpoint and its underlying deployment. --yes is used to confirm the deletion.

az ml batch-endpoint delete -n $ENDPOINT_NAME --yes

Delete the endpoint:

ml_client.batch_endpoints.begin_delete(endpoint_name).result()

(Optional) Delete compute, unless you plan to reuse your compute cluster with later deployments.

Azure CLI
Python

az ml compute delete -n batch-cluster

ml_client.compute.begin_delete(name="batch-cluster")