Configure automated ML experiments in Python

APPLIES TO: yesBasic edition yesEnterprise edition                    (Upgrade to Enterprise edition)

In this guide, learn how to define various configuration settings of your automated machine learning experiments with the Azure Machine Learning SDK. Automated machine learning picks an algorithm and hyperparameters for you and generates a model ready for deployment. There are several options that you can use to configure automated machine learning experiments.

To view examples of an automated machine learning experiments, see Tutorial: Train a classification model with automated machine learning or Train models with automated machine learning in the cloud.

Configuration options available in automated machine learning:

  • Select your experiment type: Classification, Regression, or Time Series Forecasting
  • Data source, formats, and fetch data
  • Choose your compute target: local or remote
  • Automated machine learning experiment settings
  • Run an automated machine learning experiment
  • Explore model metrics
  • Register and deploy model

If you prefer a no code experience, you can also Create your automated machine learning experiments in Azure Machine Learning studio.

Select your experiment type

Before you begin your experiment, you should determine the kind of machine learning problem you are solving. Automated machine learning supports task types of classification, regression, and forecasting. Learn more about task types.

Automated machine learning supports the following algorithms during the automation and tuning process. As a user, there is no need for you to specify the algorithm.


If you plan to export your auto ML created models to an ONNX model, only those algorithms indicated with an * are able to be converted to the ONNX format. Learn more about converting models to ONNX.

Also note, ONNX only supports classification and regression tasks at this time.

Classification Regression Time Series Forecasting
Logistic Regression* Elastic Net* Elastic Net
Light GBM* Light GBM* Light GBM
Gradient Boosting* Gradient Boosting* Gradient Boosting
Decision Tree* Decision Tree* Decision Tree
K Nearest Neighbors* K Nearest Neighbors* K Nearest Neighbors
Linear SVC* LARS Lasso* LARS Lasso
Support Vector Classification (SVC)* Stochastic Gradient Descent (SGD)* Stochastic Gradient Descent (SGD)
Random Forest* Random Forest* Random Forest
Extremely Randomized Trees* Extremely Randomized Trees* Extremely Randomized Trees
Xgboost* Xgboost* Xgboost
Averaged Perceptron Classifier Online Gradient Descent Regressor Auto-ARIMA
Naive Bayes* Fast Linear Regressor Prophet
Stochastic Gradient Descent (SGD)* ForecastTCN
Linear SVM Classifier*

Use the task parameter in the AutoMLConfig constructor to specify your experiment type.

from azureml.train.automl import AutoMLConfig

# task can be one of classification, regression, forecasting
automl_config = AutoMLConfig(task = "classification")

Data source and format

Automated machine learning supports data that resides on your local desktop or in the cloud such as Azure Blob Storage. The data can be read into a Pandas DataFrame or an Azure Machine Learning TabularDataset. Learn more about datasets.

Requirements for training data:

  • Data must be in tabular form.
  • The value to predict, target column, must be in the data.

The following code examples demonstrate how to store the data in these formats.

  • TabularDataset

    from azureml.core.dataset import Dataset
    from azureml.opendatasets import Diabetes
    tabular_dataset = Diabetes.get_tabular_dataset()
    train_dataset, test_dataset = tabular_dataset.random_split(percentage=0.1, seed=42)
    label = "Y"
  • Pandas dataframe

    import pandas as pd
    from sklearn.model_selection import train_test_split
    df = pd.read_csv("your-local-file.csv")
    train_data, test_data = train_test_split(df, test_size=0.1, random_state=42)
    label = "label-col-name"

Fetch data for running experiment on remote compute

For remote executions, training data must be accessible from the remote compute. The class Datasets in the SDK exposes functionality to:

  • easily transfer data from static files or URL sources into your workspace
  • make your data available to training scripts when running on cloud compute resources

See the how-to for an example of using the Dataset class to mount data to your compute target.

Train and validation data

You can specify separate train and validation sets directly in the AutoMLConfig constructor with the following options. Learn more about how to configure data splits and cross validation for your AutoML experiments.

K-Folds Cross Validation

Use n_cross_validations setting to specify the number of cross validations. The training data set will be randomly split into n_cross_validations folds of equal size. During each cross validation round, one of the folds will be used for validation of the model trained on the remaining folds. This process repeats for n_cross_validations rounds until each fold is used once as validation set. The average scores across all n_cross_validations rounds will be reported, and the corresponding model will be retrained on the whole training data set.

Learn more about how autoML applies cross validation to prevent over-fitting models.

Monte Carlo Cross Validation (Repeated Random Sub-Sampling)

Use validation_size to specify the percentage of the training dataset that should be used for validation, and use n_cross_validations to specify the number of cross validations. During each cross validation round, a subset of size validation_size will be randomly selected for validation of the model trained on the remaining data. Finally, the average scores across all n_cross_validations rounds will be reported, and the corresponding model will be retrained on the whole training data set. Monte Carlo is not supported for time series forecasting.

Custom validation dataset

Use custom validation dataset if random split is not acceptable, usually time series data or imbalanced data. You can specify your own validation dataset. The model will be evaluated against the validation dataset specified instead of random dataset. Learn more about how to configure a custom validation set with the SDK.

Compute to run experiment

Next determine where the model will be trained. An automated machine learning training experiment can run on the following compute options:

  • Your local machine such as a local desktop or laptop – Generally when you have small dataset and you are still in the exploration stage.

  • A remote machine in the cloud – Azure Machine Learning Managed Compute is a managed service that enables the ability to train machine learning models on clusters of Azure virtual machines.

    See this GitHub site for examples of notebooks with local and remote compute targets.

  • An Azure Databricks cluster in your Azure subscription. You can find more details here - Setup Azure Databricks cluster for Automated ML

    See this GitHub site for examples of notebooks with Azure Databricks.

Configure your experiment settings

There are several options that you can use to configure your automated machine learning experiment. These parameters are set by instantiating an AutoMLConfig object. See the AutoMLConfig class for a full list of parameters.

Some examples include:

  1. Classification experiment using AUC weighted as the primary metric with experiment timeout minutes set to 30 minutes and 2 cross-validation folds.

  2. Below is an example of a regression experiment set to end after 60 minutes with five validation cross folds.

       automl_regressor = AutoMLConfig(

The three different task parameter values (the third task-type is forecasting, and uses a similar algorithm pool as regression tasks) determine the list of models to apply. Use the whitelist or blacklist parameters to further modify iterations with the available models to include or exclude. The list of supported models can be found on SupportedModels Class for (Classification, Forecasting, and Regression).

To help avoid experiment timeout failures, Automated ML's validation service will require that experiment_timeout_minutes be set to a minimum of 15 minutes, or 60 minutes if your row by column size exceeds 10 million.

Primary Metric

The primary metric determines the metric to be used during model training for optimization. The available metrics you can select is determined by the task type you choose, and the following table shows valid primary metrics for each task type.

Classification Regression Time Series Forecasting
accuracy spearman_correlation spearman_correlation
AUC_weighted normalized_root_mean_squared_error normalized_root_mean_squared_error
average_precision_score_weighted r2_score r2_score
norm_macro_recall normalized_mean_absolute_error normalized_mean_absolute_error

Learn about the specific definitions of these metrics in Understand automated machine learning results.

Data featurization

In every automated machine learning experiment, your data is automatically scaled and normalized to help certain algorithms that are sensitive to features that are on different scales. However, you can also enable additional featurization, such as missing values imputation, encoding, and transforms.

When configuring your experiments in your AutoMLConfig object, you can enable/disable the setting featurization. The following table shows the accepted settings for featurization in the AutoMLConfig class.

Featurization Configuration Description
"featurization": 'auto' Indicates that as part of preprocessing, data guardrails and featurization steps are performed automatically. Default setting
"featurization": 'off' Indicates featurization step should not be done automatically.
"featurization": 'FeaturizationConfig' Indicates customized featurization step should be used. Learn how to customize featurization.


Automated machine learning featurization steps (feature normalization, handling missing data, converting text to numeric, etc.) become part of the underlying model. When using the model for predictions, the same featurization steps applied during training are applied to your input data automatically.

Time Series Forecasting

The time series forecasting task requires additional parameters in the configuration object:

  1. time_column_name: Required parameter that defines the name of the column in your training data containing a valid time-series.
  2. forecast_horizon: Defines how many periods forward you would like to forecast. The integer horizon is in units of the timeseries frequency. For example if you have training data with daily frequency, you define how far out in days you want the model to train for.
  3. time_series_id_column_names: Defines the columns that uniquely identify the time series in data that has multiple rows with the same timestamp. For example, if you are forecasting sales of a particular brand by store, you would define store and brand columns as your time series identifiers. Separate forecasts will be created for each grouping. If the time series identifiers are not defined, the data set is assumed to be one time series.

For examples of the settings used below, see the sample notebook.

# Setting Store and Brand as time series identifiers for training.
time_series_id_column_names = ['Store', 'Brand']
nseries = data.groupby(time_series_id_column_names).ngroups

# View the number of time series data with defined time series identifiers
print('Data contains {0} individual time-series.'.format(nseries))
time_series_settings = {
    'time_column_name': time_column_name,
    'time_series_id_column_names': time_series_id_column_names,
    'drop_column_names': ['logQuantity'],
    'forecast_horizon': n_test_periods

automl_config = AutoMLConfig(task = 'forecasting',

Ensemble configuration

Ensemble models are enabled by default, and appear as the final run iterations in an automated machine learning run. Currently supported ensemble methods are voting and stacking. Voting is implemented as soft-voting using weighted averages, and the stacking implementation is using a two layer implementation, where the first layer has the same models as the voting ensemble, and the second layer model is used to find the optimal combination of the models from the first layer. If you are using ONNX models, or have model-explainability enabled, stacking will be disabled and only voting will be utilized.

There are multiple default arguments that can be provided as kwargs in an AutoMLConfig object to alter the default ensemble behavior.

  • ensemble_download_models_timeout_sec: During VotingEnsemble and StackEnsemble model generation, multiple fitted models from the previous child runs are downloaded. If you encounter this error: AutoMLEnsembleException: Could not find any models for running ensembling, then you may need to provide more time for the models to be downloaded. The default value is 300 seconds for downloading these models in parallel and there is no maximum timeout limit. Configure this parameter with a higher value than 300 secs, if more time is needed.


    If the timeout is reached and there are models downloaded, then the ensembling proceeds with as many models it has downloaded. It's not required that all the models need to be downloaded to finish within that timeout.

The following parameters only apply to StackEnsemble models:

  • stack_meta_learner_type: the meta-learner is a model trained on the output of the individual heterogeneous models. Default meta-learners are LogisticRegression for classification tasks (or LogisticRegressionCV if cross-validation is enabled) and ElasticNet for regression/forecasting tasks (or ElasticNetCV if cross-validation is enabled). This parameter can be one of the following strings: LogisticRegression, LogisticRegressionCV, LightGBMClassifier, ElasticNet, ElasticNetCV, LightGBMRegressor, or LinearRegression.

  • stack_meta_learner_train_percentage: specifies the proportion of the training set (when choosing train and validation type of training) to be reserved for training the meta-learner. Default value is 0.2.

  • stack_meta_learner_kwargs: optional parameters to pass to the initializer of the meta-learner. These parameters and parameter types mirror the parameters and parameter types from the corresponding model constructor, and are forwarded to the model constructor.

The following code shows an example of specifying custom ensemble behavior in an AutoMLConfig object.

ensemble_settings = {
    "ensemble_download_models_timeout_sec": 600
    "stack_meta_learner_type": "LogisticRegressionCV",
    "stack_meta_learner_train_percentage": 0.3,
    "stack_meta_learner_kwargs": {
        "refit": True,
        "fit_intercept": False,
        "class_weight": "balanced",
        "multi_class": "auto",
        "n_jobs": -1

automl_classifier = AutoMLConfig(

Ensemble training is enabled by default, but it can be disabled by using the enable_voting_ensemble and enable_stack_ensemble boolean parameters.

automl_classifier = AutoMLConfig(

Run experiment

For automated ML, you create an Experiment object, which is a named object in a Workspace used to run experiments.

from azureml.core.experiment import Experiment

ws = Workspace.from_config()

# Choose a name for the experiment and specify the project folder.
experiment_name = 'automl-classification'
project_folder = './sample_projects/automl-classification'

experiment = Experiment(ws, experiment_name)

Submit the experiment to run and generate a model. Pass the AutoMLConfig to the submit method to generate the model.

run = experiment.submit(automl_config, show_output=True)


Dependencies are first installed on a new machine. It may take up to 10 minutes before output is shown. Setting show_output to True results in output being shown on the console.

Exit criteria

There are a few options you can define to end your experiment.

  1. No Criteria: If you do not define any exit parameters the experiment will continue until no further progress is made on your primary metric.
  2. Exit after a length of time: Using experiment_timeout_minutes in your settings allows you to define how long in minutes should an experiment continue in run.
  3. Exit after a score has been reached: Using experiment_exit_score will complete the experiment after a primary metric score has been reached.

Explore model metrics

You can view your training results in a widget or inline if you are in a notebook. See Track and evaluate models for more details.

For details on how to download or register a model for deployment to a web service, see how and where to deploy a model.

Understand automated ML models

Any model produced using automated ML includes the following steps:

  • Automated feature engineering (if "featurization": 'auto')
  • Scaling/Normalization and algorithm with hyperparameter values

We make it transparent to get this information from the fitted_model output from automated ML.

automl_config = AutoMLConfig(…)
automl_run = experiment.submit(automl_config …)
best_run, fitted_model = automl_run.get_output()

Automated feature engineering

See the list of preprocessing and automated feature engineering that happens when "featurization": 'auto'.

Consider this example:

  • There are four input features: A (Numeric), B (Numeric), C (Numeric), D (DateTime)
  • Numeric feature C is dropped because it is an ID column with all unique values
  • Numeric features A and B have missing values and hence are imputed by the mean
  • DateTime feature D is featurized into 11 different engineered features

Use these 2 APIs on the first step of fitted model to understand more. See this sample notebook.

  • API 1: get_engineered_feature_names() returns a list of engineered feature names.


    fitted_model.named_steps['timeseriestransformer']. get_engineered_feature_names ()
    Output: ['A', 'B', 'A_WASNULL', 'B_WASNULL', 'year', 'half', 'quarter', 'month', 'day', 'hour', 'am_pm', 'hour12', 'wday', 'qday', 'week']

    This list includes all engineered feature names.


    Use 'timeseriestransformer' for task='forecasting', else use 'datatransformer' for 'regression' or 'classification' task.

  • API 2: get_featurization_summary() returns featurization summary for all the input features.




    Use 'timeseriestransformer' for task='forecasting', else use 'datatransformer' for 'regression' or 'classification' task.


    [{'RawFeatureName': 'A',
      'TypeDetected': 'Numeric',
      'Dropped': 'No',
      'EngineeredFeatureCount': 2,
      'Tranformations': ['MeanImputer', 'ImputationMarker']},
    {'RawFeatureName': 'B',
      'TypeDetected': 'Numeric',
      'Dropped': 'No',
      'EngineeredFeatureCount': 2,
      'Tranformations': ['MeanImputer', 'ImputationMarker']},
    {'RawFeatureName': 'C',
      'TypeDetected': 'Numeric',
      'Dropped': 'Yes',
      'EngineeredFeatureCount': 0,
      'Tranformations': []},
    {'RawFeatureName': 'D',
      'TypeDetected': 'DateTime',
      'Dropped': 'No',
      'EngineeredFeatureCount': 11,
      'Tranformations': ['DateTime','DateTime','DateTime','DateTime','DateTime','DateTime','DateTime','DateTime','DateTime','DateTime','DateTime']}]


    Output Definition
    RawFeatureName Input feature/column name from the dataset provided.
    TypeDetected Detected datatype of the input feature.
    Dropped Indicates if the input feature was dropped or used.
    EngineeringFeatureCount Number of features generated through automated feature engineering transforms.
    Transformations List of transformations applied to input features to generate engineered features.

Scaling/Normalization and algorithm with hyperparameter values:

To understand the scaling/normalization and algorithm/hyperparameter values for a pipeline, use fitted_model.steps. Learn more about scaling/normalization. Here is a sample output:

[('RobustScaler', RobustScaler(copy=True, quantile_range=[10, 90], with_centering=True, with_scaling=True)), ('LogisticRegression', LogisticRegression(C=0.18420699693267145, class_weight='balanced', dual=False, fit_intercept=True, intercept_scaling=1, max_iter=100, multi_class='multinomial', n_jobs=1, penalty='l2', random_state=None, solver='newton-cg', tol=0.0001, verbose=0, warm_start=False))

To get more details, use this helper function:

from pprint import pprint

def print_model(model, prefix=""):
    for step in model.steps:
        print(prefix + step[0])
        if hasattr(step[1], 'estimators') and hasattr(step[1], 'weights'):
            pprint({'estimators': list(
                e[0] for e in step[1].estimators), 'weights': step[1].weights})
            for estimator in step[1].estimators:
                print_model(estimator[1], estimator[0] + ' - ')


The following sample output is for a pipeline using a specific algorithm (LogisticRegression with RobustScalar, in this case).

{'copy': True,
'quantile_range': [10, 90],
'with_centering': True,
'with_scaling': True}

{'C': 0.18420699693267145,
'class_weight': 'balanced',
'dual': False,
'fit_intercept': True,
'intercept_scaling': 1,
'max_iter': 100,
'multi_class': 'multinomial',
'n_jobs': 1,
'penalty': 'l2',
'random_state': None,
'solver': 'newton-cg',
'tol': 0.0001,
'verbose': 0,
'warm_start': False}

Predict class probability

Models produced using automated ML all have wrapper objects that mirror functionality from their open-source origin class. Most classification model wrapper objects returned by automated ML implement the predict_proba() function, which accepts an array-like or sparse matrix data sample of your features (X values), and returns an n-dimensional array of each sample and its respective class probability.

Assuming you have retrieved the best run and fitted model using the same calls from above, you can call predict_proba() directly from the fitted model, supplying an X_test sample in the appropriate format depending on the model type.

best_run, fitted_model = automl_run.get_output()
class_prob = fitted_model.predict_proba(X_test)

If the underlying model does not support the predict_proba() function or the format is incorrect, a model class-specific exception will be thrown. See the RandomForestClassifier and XGBoost reference docs for examples of how this function is implemented for different model types.

Model interpretability

Model interpretability allows you to understand why your models made predictions, and the underlying feature importance values. The SDK includes various packages for enabling model interpretability features, both at training and inference time, for local and deployed models.

See the how-to for code samples on how to enable interpretability features specifically within automated machine learning experiments.

For general information on how model explanations and feature importance can be enabled in other areas of the SDK outside of automated machine learning, see the concept article on interpretability.


The ForecastTCN model is not currently supported by the Explanation Client. This model will not return an explanation dashboard if it is returned as the best model, and does not support on-demand explanation runs.

Next steps