In this article, you learn how to automate hyperparameter tuning in Azure Machine Learning pipelines by using Azure Machine Learning CLI v2 or Azure Machine Learning SDK for Python v2.
Hyperparameters are adjustable parameters that let you control the model training process. Hyperparameter tuning is the process of finding the configuration of hyperparameters that results in the best performance. Azure Machine Learning lets you automate hyperparameter tuning and run experiments in parallel to efficiently optimize hyperparameters.
Prerequisites
Have an Azure Machine Learning account and workspace.
Create a command component with hyperparameter inputs
The Azure Machine Learning pipeline must have a command component with hyperparameter inputs. The following train.yml file from the example projects defines a trial component that has the c_value, kernel, and coef hyperparameter inputs and runs the source code that's located in the ./train-src folder.
The source code for this example is a single train.py file. This code executes in every trial of the sweep job.
# imports
import os
import mlflow
import argparse
import pandas as pd
from pathlib import Path
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split
# define functions
def main(args):
# enable auto logging
mlflow.autolog()
# setup parameters
params = {
"C": args.C,
"kernel": args.kernel,
"degree": args.degree,
"gamma": args.gamma,
"coef0": args.coef0,
"shrinking": args.shrinking,
"probability": args.probability,
"tol": args.tol,
"cache_size": args.cache_size,
"class_weight": args.class_weight,
"verbose": args.verbose,
"max_iter": args.max_iter,
"decision_function_shape": args.decision_function_shape,
"break_ties": args.break_ties,
"random_state": args.random_state,
}
# read in data
df = pd.read_csv(args.data)
# process data
X_train, X_test, y_train, y_test = process_data(df, args.random_state)
# train model
model = train_model(params, X_train, X_test, y_train, y_test)
# Output the model and test data
# write to local folder first, then copy to output folder
mlflow.sklearn.save_model(model, "model")
from distutils.dir_util import copy_tree
# copy subdirectory example
from_directory = "model"
to_directory = args.model_output
copy_tree(from_directory, to_directory)
X_test.to_csv(Path(args.test_data) / "X_test.csv", index=False)
y_test.to_csv(Path(args.test_data) / "y_test.csv", index=False)
def process_data(df, random_state):
# split dataframe into X and y
X = df.drop(["species"], axis=1)
y = df["species"]
# train/test split
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=random_state
)
# return split data
return X_train, X_test, y_train, y_test
def train_model(params, X_train, X_test, y_train, y_test):
# train model
model = SVC(**params)
model = model.fit(X_train, y_train)
# return model
return model
def parse_args():
# setup arg parser
parser = argparse.ArgumentParser()
# add arguments
parser.add_argument("--data", type=str)
parser.add_argument("--C", type=float, default=1.0)
parser.add_argument("--kernel", type=str, default="rbf")
parser.add_argument("--degree", type=int, default=3)
parser.add_argument("--gamma", type=str, default="scale")
parser.add_argument("--coef0", type=float, default=0)
parser.add_argument("--shrinking", type=bool, default=False)
parser.add_argument("--probability", type=bool, default=False)
parser.add_argument("--tol", type=float, default=1e-3)
parser.add_argument("--cache_size", type=float, default=1024)
parser.add_argument("--class_weight", type=dict, default=None)
parser.add_argument("--verbose", type=bool, default=False)
parser.add_argument("--max_iter", type=int, default=-1)
parser.add_argument("--decision_function_shape", type=str, default="ovr")
parser.add_argument("--break_ties", type=bool, default=False)
parser.add_argument("--random_state", type=int, default=42)
parser.add_argument("--model_output", type=str, help="Path of output model")
parser.add_argument("--test_data", type=str, help="Path of output model")
# parse args
args = parser.parse_args()
# return args
return args
# run script
if __name__ == "__main__":
# parse args
args = parse_args()
# run main function
main(args)
Note
Make sure to log the metrics in the trial component source code with exactly the same name as the primary_metric value in the pipeline file. This example uses mlflow.autolog(), which is the recommended way to track machine learning experiments. For more information about MLflow, see Track ML experiments and models with MLflow.
Create a pipeline with a hyperparameter sweep step
Given the command component defined in train.yml, the following code creates a two-step train and predict pipeline definition file. In the sweep_step, the required step type is sweep, and the c_value, kernel, and coef hyperparameter inputs for the trial component are added to the search_space.
The following example highlights the hyperparameter tuning sweep_step.
In the v2 SDK, you can enable hyperparameter tuning for any command component by calling the .sweep() method. The following pipeline definition shows how to enable sweep for train_model.
The example first loads the train_component_func defined in the train.yml file. To create the train_model, the code adds the c_value, kernel, and coef0 hyperparameters into the search space. The sweep_step defines the primary_metric, sampling_algorithm, and other parameters.
After you submit this pipeline job, Azure Machine Learning runs the trial component multiple times to sweep over hyperparameters, based on the search space and limits you defined in the sweep_step.
View hyperparameter tuning results in studio
After you submit a pipeline job, the SDK or CLI widget gives you a web URL link to the pipeline graph in the Azure Machine Learning studio UI.
To view hyperparameter tuning results, double-click the sweep step in the pipeline graph, select the Child jobs tab in the details panel, and then select the child job.
On the child job page, select the Trials tab to see and compare metrics for all the child runs. Select any of the child runs to see the details for that run.
If a child run failed, you can select the Outputs + logs tab on the child run page to see useful debug information.
Manage data ingestion and preparation, model training and deployment, and machine learning solution monitoring with Python, Azure Machine Learning and MLflow.