Configure automated ML experiments in Python

APPLIES TO: yesBasic edition yesEnterprise edition                    (Upgrade to Enterprise edition)

In this guide, learn how to define various configuration settings of your automated machine learning experiments with the Azure Machine Learning SDK. Automated machine learning picks an algorithm and hyperparameters for you and generates a model ready for deployment. There are several options that you can use to configure automated machine learning experiments.

To view examples of an automated machine learning experiments, see Tutorial: Train a classification model with automated machine learning or Train models with automated machine learning in the cloud.

Configuration options available in automated machine learning:

  • Select your experiment type: Classification, Regression, or Time Series Forecasting
  • Data source, formats, and fetch data
  • Choose your compute target: local or remote
  • Automated machine learning experiment settings
  • Run an automated machine learning experiment
  • Explore model metrics
  • Register and deploy model

If you prefer a no code experience, you can also Create your automated machine learning experiments in Azure Machine Learning studio.

Select your experiment type

Before you begin your experiment, you should determine the kind of machine learning problem you are solving. Automated machine learning supports task types of classification, regression, and forecasting. Learn more about task types.

Automated machine learning supports the following algorithms during the automation and tuning process. As a user, there is no need for you to specify the algorithm.

Classification Regression Time Series Forecasting
Logistic Regression Elastic Net Elastic Net
Light GBM Light GBM Light GBM
Gradient Boosting Gradient Boosting Gradient Boosting
Decision Tree Decision Tree Decision Tree
K Nearest Neighbors K Nearest Neighbors K Nearest Neighbors
Linear SVC LARS Lasso LARS Lasso
Support Vector Classification (SVC) Stochastic Gradient Descent (SGD) Stochastic Gradient Descent (SGD)
Random Forest Random Forest Random Forest
Extremely Randomized Trees Extremely Randomized Trees Extremely Randomized Trees
Xgboost Xgboost Xgboost
DNN Classifier DNN Regressor DNN Regressor
DNN Linear Classifier Linear Regressor Linear Regressor
Naive Bayes Fast Linear Regressor Auto-ARIMA
Stochastic Gradient Descent (SGD) Online Gradient Descent Regressor Prophet
Averaged Perceptron Classifier ForecastTCN
Linear SVM Classifier

Use the task parameter in the AutoMLConfig constructor to specify your experiment type.

from azureml.train.automl import AutoMLConfig

# task can be one of classification, regression, forecasting
automl_config = AutoMLConfig(task = "classification")

Data source and format

Automated machine learning supports data that resides on your local desktop or in the cloud such as Azure Blob Storage. The data can be read into a Pandas DataFrame or an Azure Machine Learning TabularDataset. Learn more about datasets.

Requirements for training data:

  • Data must be in tabular form.
  • The value to predict, target column, must be in the data.

The following code examples demonstrate how to store the data in these formats.

  • TabularDataset

    from azureml.core.dataset import Dataset
    from azureml.opendatasets import Diabetes
    tabular_dataset = Diabetes.get_tabular_dataset()
    train_dataset, test_dataset = tabular_dataset.random_split(percentage=0.1, seed=42)
    label = "Y"
  • Pandas dataframe

    import pandas as pd
    from sklearn.model_selection import train_test_split
    df = pd.read_csv("your-local-file.csv")
    train_data, test_data = train_test_split(df, test_size=0.1, random_state=42)
    label = "label-col-name"

Fetch data for running experiment on remote compute

For remote executions, training data must be accessible from the remote compute. The class Datasets in the SDK exposes functionality to:

  • easily transfer data from static files or URL sources into your workspace
  • make your data available to training scripts when running on cloud compute resources

See the how-to for an example of using the Dataset class to mount data to your compute target.

Train and validation data

You can specify separate train and validation sets directly in the AutoMLConfig constructor.

K-Folds Cross Validation

Use n_cross_validations setting to specify the number of cross validations. The training data set will be randomly split into n_cross_validations folds of equal size. During each cross validation round, one of the folds will be used for validation of the model trained on the remaining folds. This process repeats for n_cross_validations rounds until each fold is used once as validation set. The average scores across all n_cross_validations rounds will be reported, and the corresponding model will be retrained on the whole training data set.

Monte Carlo Cross Validation (Repeated Random Sub-Sampling)

Use validation_size to specify the percentage of the training dataset that should be used for validation, and use n_cross_validations to specify the number of cross validations. During each cross validation round, a subset of size validation_size will be randomly selected for validation of the model trained on the remaining data. Finally, the average scores across all n_cross_validations rounds will be reported, and the corresponding model will be retrained on the whole training data set. Monte Carlo is not supported for time series forecasting.

Custom validation dataset

Use custom validation dataset if random split is not acceptable, usually time series data or imbalanced data. You can specify your own validation dataset. The model will be evaluated against the validation dataset specified instead of random dataset.

Compute to run experiment

Next determine where the model will be trained. An automated machine learning training experiment can run on the following compute options:

  • Your local machine such as a local desktop or laptop – Generally when you have small dataset and you are still in the exploration stage.

  • A remote machine in the cloud – Azure Machine Learning Managed Compute is a managed service that enables the ability to train machine learning models on clusters of Azure virtual machines.

    See this GitHub site for examples of notebooks with local and remote compute targets.

  • An Azure Databricks cluster in your Azure subscription. You can find more details here - Setup Azure Databricks cluster for Automated ML

    See this GitHub site for examples of notebooks with Azure Databricks.

Configure your experiment settings

There are several options that you can use to configure your automated machine learning experiment. These parameters are set by instantiating an AutoMLConfig object. See the AutoMLConfig class for a full list of parameters.

Some examples include:

  1. Classification experiment using AUC weighted as the primary metric with experiment timeout minutes set to 30 minutes and 2 cross-validation folds.

  2. Below is an example of a regression experiment set to end after 60 minutes with five validation cross folds.

    automl_regressor = AutoMLConfig(
        whitelist_models='kNN regressor'

The three different task parameter values (the third task-type is forecasting, and uses a similar algorithm pool as regression tasks) determine the list of models to apply. Use the whitelist or blacklist parameters to further modify iterations with the available models to include or exclude. The list of supported models can be found on SupportedModels Class for (Classification, Forecasting, and Regression).

Automated ML's validation serivce will require that experiment_timeout_minutes be set to a minimum timeout of 15 minutes in order to help avoid experiment timeout failures.

Primary Metric

The primary metric determines the metric to be used during model training for optimization. The available metrics you can select is determined by the task type you choose, and the following table shows valid primary metrics for each task type.

Classification Regression Time Series Forecasting
accuracy spearman_correlation spearman_correlation
AUC_weighted normalized_root_mean_squared_error normalized_root_mean_squared_error
average_precision_score_weighted r2_score r2_score
norm_macro_recall normalized_mean_absolute_error normalized_mean_absolute_error

Learn about the specific definitions of these metrics in Understand automated machine learning results.

Data featurization

In every automated machine learning experiment, your data is automatically scaled and normalized to help certain algorithms that are sensitive to features that are on different scales. However, you can also enable additional featurization, such as missing values imputation, encoding, and transforms. Learn more about what featurization is included.

When configuring your experiments, you can enable the advanced setting featurization. The following table shows the accepted settings for featurization in the AutoMLConfig class.

Featurization Configuration Description
"featurization": 'FeaturizationConfig' Indicates customized featurization step should be used. Learn how to customize featurization.
"featurization": 'off' Indicates featurization step should not be done automatically.
"featurization": 'auto' Indicates that as part of preprocessing, data guardrails and featurization steps are performed automatically.


Automated machine learning featurization steps (feature normalization, handling missing data, converting text to numeric, etc.) become part of the underlying model. When using the model for predictions, the same featurization steps applied during training are applied to your input data automatically.

Time Series Forecasting

The time series forecasting task requires additional parameters in the configuration object:

  1. time_column_name: Required parameter that defines the name of the column in your training data containing a valid time-series.
  2. max_horizon: Defines the length of time you want to predict out based on the periodicity of the training data. For example if you have training data with daily time grains, you define how far out in days you want the model to train for.
  3. grain_column_names: Defines the name of columns that contain individual time series data in your training data. For example, if you are forecasting sales of a particular brand by store, you would define store and brand columns as your grain columns. Separate time-series and forecasts will be created for each grain/grouping.

For examples of the settings used below, see the sample notebook.

# Setting Store and Brand as grains for training.
grain_column_names = ['Store', 'Brand']
nseries = data.groupby(grain_column_names).ngroups

# View the number of time series data with defined grains
print('Data contains {0} individual time-series.'.format(nseries))
time_series_settings = {
    'time_column_name': time_column_name,
    'grain_column_names': grain_column_names,
    'drop_column_names': ['logQuantity'],
    'max_horizon': n_test_periods

automl_config = AutoMLConfig(task = 'forecasting',

Ensemble configuration

Ensemble models are enabled by default, and appear as the final run iterations in an automated machine learning run. Currently supported ensemble methods are voting and stacking. Voting is implemented as soft-voting using weighted averages, and the stacking implementation is using a two layer implementation, where the first layer has the same models as the voting ensemble, and the second layer model is used to find the optimal combination of the models from the first layer. If you are using ONNX models, or have model-explainability enabled, stacking will be disabled and only voting will be utilized.

There are multiple default arguments that can be provided as kwargs in an AutoMLConfig object to alter the default stack ensemble behavior.

  • stack_meta_learner_type: the meta-learner is a model trained on the output of the individual heterogeneous models. Default meta-learners are LogisticRegression for classification tasks (or LogisticRegressionCV if cross-validation is enabled) and ElasticNet for regression/forecasting tasks (or ElasticNetCV if cross-validation is enabled). This parameter can be one of the following strings: LogisticRegression, LogisticRegressionCV, LightGBMClassifier, ElasticNet, ElasticNetCV, LightGBMRegressor, or LinearRegression.
  • stack_meta_learner_train_percentage: specifies the proportion of the training set (when choosing train and validation type of training) to be reserved for training the meta-learner. Default value is 0.2.
  • stack_meta_learner_kwargs: optional parameters to pass to the initializer of the meta-learner. These parameters and parameter types mirror the parameters and parameter types from the corresponding model constructor, and are forwarded to the model constructor.

The following code shows an example of specifying custom ensemble behavior in an AutoMLConfig object.

ensemble_settings = {
    "stack_meta_learner_type": "LogisticRegressionCV",
    "stack_meta_learner_train_percentage": 0.3,
    "stack_meta_learner_kwargs": {
        "refit": True,
        "fit_intercept": False,
        "class_weight": "balanced",
        "multi_class": "auto",
        "n_jobs": -1

automl_classifier = AutoMLConfig(

Ensemble training is enabled by default, but it can be disabled by using the enable_voting_ensemble and enable_stack_ensemble boolean parameters.

automl_classifier = AutoMLConfig(

Run experiment

For automated ML, you create an Experiment object, which is a named object in a Workspace used to run experiments.

from azureml.core.experiment import Experiment

ws = Workspace.from_config()

# Choose a name for the experiment and specify the project folder.
experiment_name = 'automl-classification'
project_folder = './sample_projects/automl-classification'

experiment = Experiment(ws, experiment_name)

Submit the experiment to run and generate a model. Pass the AutoMLConfig to the submit method to generate the model.

run = experiment.submit(automl_config, show_output=True)


Dependencies are first installed on a new machine. It may take up to 10 minutes before output is shown. Setting show_output to True results in output being shown on the console.

Exit Criteria

There are a few options you can define to end your experiment.

  1. No Criteria: If you do not define any exit parameters the experiment will continue until no further progress is made on your primary metric.
  2. Exit after a length of time: Using experiment_timeout_minutes in your settings allows you to define how long in minutes should an experiment continue in run.
  3. Exit after a score has been reached: Using experiment_exit_score will complete the experiment after a primary metric score has been reached.

Explore model metrics

You can view your training results in a widget or inline if you are in a notebook. See Track and evaluate models for more details.

Understand automated ML models

Any model produced using automated ML includes the following steps:

  • Automated feature engineering (if "featurization": 'auto')
  • Scaling/Normalization and algorithm with hyperparameter values

We make it transparent to get this information from the fitted_model output from automated ML.

automl_config = AutoMLConfig(…)
automl_run = experiment.submit(automl_config …)
best_run, fitted_model = automl_run.get_output()

Automated feature engineering

See the list of preprocessing and automated feature engineering that happens when "featurization": 'auto'.

Consider this example:

  • There are four input features: A (Numeric), B (Numeric), C (Numeric), D (DateTime)
  • Numeric feature C is dropped because it is an ID column with all unique values
  • Numeric features A and B have missing values and hence are imputed by the mean
  • DateTime feature D is featurized into 11 different engineered features

Use these 2 APIs on the first step of fitted model to understand more. See this sample notebook.

  • API 1: get_engineered_feature_names() returns a list of engineered feature names.


    fitted_model.named_steps['timeseriestransformer']. get_engineered_feature_names ()
    Output: ['A', 'B', 'A_WASNULL', 'B_WASNULL', 'year', 'half', 'quarter', 'month', 'day', 'hour', 'am_pm', 'hour12', 'wday', 'qday', 'week']

    This list includes all engineered feature names.


    Use 'timeseriestransformer' for task=’forecasting’, else use 'datatransformer' for ‘regression’ or ‘classification’ task.

  • API 2: get_featurization_summary() returns featurization summary for all the input features.




    Use 'timeseriestransformer' for task=’forecasting’, else use 'datatransformer' for ‘regression’ or ‘classification’ task.


    [{'RawFeatureName': 'A',
      'TypeDetected': 'Numeric',
      'Dropped': 'No',
      'EngineeredFeatureCount': 2,
      'Tranformations': ['MeanImputer', 'ImputationMarker']},
    {'RawFeatureName': 'B',
      'TypeDetected': 'Numeric',
      'Dropped': 'No',
      'EngineeredFeatureCount': 2,
      'Tranformations': ['MeanImputer', 'ImputationMarker']},
    {'RawFeatureName': 'C',
      'TypeDetected': 'Numeric',
      'Dropped': 'Yes',
      'EngineeredFeatureCount': 0,
      'Tranformations': []},
    {'RawFeatureName': 'D',
      'TypeDetected': 'DateTime',
      'Dropped': 'No',
      'EngineeredFeatureCount': 11,
      'Tranformations': ['DateTime','DateTime','DateTime','DateTime','DateTime','DateTime','DateTime','DateTime','DateTime','DateTime','DateTime']}]


    Output Definition
    RawFeatureName Input feature/column name from the dataset provided.
    TypeDetected Detected datatype of the input feature.
    Dropped Indicates if the input feature was dropped or used.
    EngineeringFeatureCount Number of features generated through automated feature engineering transforms.
    Transformations List of transformations applied to input features to generate engineered features.

Customize feature engineering

To customize feature engineering, specify "featurization": FeaturizationConfig.

Supported customization includes:

Customization Definition
Column purpose update Override feature type for the specified column.
Transformer parameter update Update parameters for the specified transformer. Currently supports Imputer (mean, most frequent & median) and HashOneHotEncoder.
Drop columns Columns to drop from being featurized.
Block transformers Block transformers to be used on featurization process.

Create the FeaturizationConfig object using API calls:

featurization_config = FeaturizationConfig()
featurization_config.blocked_transformers = ['LabelEncoder']
featurization_config.drop_columns = ['aspiration', 'stroke']
featurization_config.add_column_purpose('engine-size', 'Numeric')
featurization_config.add_column_purpose('body-style', 'CategoricalHash')
#default strategy mean, add transformer param for for 3 columns
featurization_config.add_transformer_params('Imputer', ['engine-size'], {"strategy": "median"})
featurization_config.add_transformer_params('Imputer', ['city-mpg'], {"strategy": "median"})
featurization_config.add_transformer_params('Imputer', ['bore'], {"strategy": "most_frequent"})
featurization_config.add_transformer_params('HashOneHotEncoder', [], {"number_of_bits": 3})

Scaling/Normalization and algorithm with hyperparameter values:

To understand the scaling/normalization and algorithm/hyperparameter values for a pipeline, use fitted_model.steps. Learn more about scaling/normalization. Here is a sample output:

[('RobustScaler', RobustScaler(copy=True, quantile_range=[10, 90], with_centering=True, with_scaling=True)), ('LogisticRegression', LogisticRegression(C=0.18420699693267145, class_weight='balanced', dual=False, fit_intercept=True, intercept_scaling=1, max_iter=100, multi_class='multinomial', n_jobs=1, penalty='l2', random_state=None, solver='newton-cg', tol=0.0001, verbose=0, warm_start=False))

To get more details, use this helper function shown in this sample notebook.

from pprint import pprint

def print_model(model, prefix=""):
    for step in model.steps:
        print(prefix + step[0])
        if hasattr(step[1], 'estimators') and hasattr(step[1], 'weights'):
            pprint({'estimators': list(
                e[0] for e in step[1].estimators), 'weights': step[1].weights})
            for estimator in step[1].estimators:
                print_model(estimator[1], estimator[0] + ' - ')


The following sample output is for a pipeline using a specific algorithm (LogisticRegression with RobustScalar, in this case).

{'copy': True,
'quantile_range': [10, 90],
'with_centering': True,
'with_scaling': True}

{'C': 0.18420699693267145,
'class_weight': 'balanced',
'dual': False,
'fit_intercept': True,
'intercept_scaling': 1,
'max_iter': 100,
'multi_class': 'multinomial',
'n_jobs': 1,
'penalty': 'l2',
'random_state': None,
'solver': 'newton-cg',
'tol': 0.0001,
'verbose': 0,
'warm_start': False}

Predict class probability

Models produced using automated ML all have wrapper objects that mirror functionality from their open-source origin class. Most classification model wrapper objects returned by automated ML implement the predict_proba() function, which accepts an array-like or sparse matrix data sample of your features (X values), and returns an n-dimensional array of each sample and its respective class probability.

Assuming you have retrieved the best run and fitted model using the same calls from above, you can call predict_proba() directly from the fitted model, supplying an X_test sample in the appropriate format depending on the model type.

best_run, fitted_model = automl_run.get_output()
class_prob = fitted_model.predict_proba(X_test)

If the underlying model does not support the predict_proba() function or the format is incorrect, a model class-specific exception will be thrown. See the RandomForestClassifier and XGBoost reference docs for examples of how this function is implemented for different model types.

Model interpretability

Model interpretability allows you to understand why your models made predictions, and the underlying feature importance values. The SDK includes various packages for enabling model interpretability features, both at training and inference time, for local and deployed models.

See the how-to for code samples on how to enable interpretability features specifically within automated machine learning experiments.

For general information on how model explanations and feature importance can be enabled in other areas of the SDK outside of automated machine learning, see the concept article on interpretability.

Next steps

Learn more about how and where to deploy a model.

Learn more about how to train a regression model with Automated machine learning or how to train using Automated machine learning on a remote resource.