Configure automated ML experiments in Python

In this guide, learn how to define various configuration settings of your automated machine learning experiments with the Azure Machine Learning SDK. Automated machine learning picks an algorithm and hyperparameters for you and generates a model ready for deployment. There are several options that you can use to configure automated machine learning experiments.

To view examples of an automated machine learning experiments , see Tutorial: Train a classification model with automated machine learning or Train models with automated machine learning in the cloud.

Configuration options available in automated machine learning:

  • Select your experiment type: Classification, Regression or Time Series Forecasting
  • Data source, formats, and fetch data
  • Choose your compute target: local or remote
  • Automated machine learning experiment settings
  • Run an automated machine learning experiment
  • Explore model metrics
  • Register and deploy model

If you prefer a no code experience, you can also Create your automated machine learning experiments in the Azure portal.

Select your experiment type

Before you begin your experiment, you should determine the kind of machine learning problem you are solving. Automated machine learning supports task types of classification, regression and forecasting.

Automated machine learning supports the following algorithms during the automation and tuning process. As a user, there is no need for you to specify the algorithm.

Classification Regression Time Series Forecasting
Logistic Regression Elastic Net Elastic Net
Light GBM Light GBM Light GBM
Gradient Boosting Gradient Boosting Gradient Boosting
Decision Tree Decision Tree Decision Tree
K Nearest Neighbors K Nearest Neighbors K Nearest Neighbors
Linear SVC LARS Lasso LARS Lasso
C-Support Vector Classification (SVC) Stochastic Gradient Descent (SGD) Stochastic Gradient Descent (SGD)
Random Forest Random Forest Random Forest
Extremely Randomized Trees Extremely Randomized Trees Extremely Randomized Trees
Xgboost Xgboost Xgboost
DNN Classifier DNN Regressor DNN Regressor
DNN Linear Classifier Linear Regressor Linear Regressor
Naive Bayes
Stochastic Gradient Descent (SGD)

Use the task parameter in the AutoMLConfig constructor to specify your experiment type.

from azureml.train.automl import AutoMLConfig

# task can be one of classification, regression, forecasting
automl_config = AutoMLConfig(task="classification")

Data source and format

Automated machine learning supports data that resides on your local desktop or in the cloud such as Azure Blob Storage. The data can be read into scikit-learn supported data formats. You can read the data into:

  • Numpy arrays X (features) and y (target variable or also known as label)
  • Pandas dataframe


Requirements for training data:

  • Data must be in tabular form.
  • The value you want to predict (target column) must be present in the data.


  • Numpy arrays

    digits = datasets.load_digits()
    X_digits =
    y_digits =
  • Pandas dataframe

    import pandas as pd
    from sklearn.model_selection import train_test_split
    df = pd.read_csv(",_1.6MB,_3.4k-rows.cleaned.2.tsv", delimiter="\t", quotechar='"')
    # get integer labels
    y = df["Label"]
    df = df.drop(["Label"], axis=1)
    df_train, _, y_train, _ = train_test_split(df, y, test_size=0.1, random_state=42)

Fetch data for running experiment on remote compute

For remote executions, you need to make the data accessible from the remote compute. This can be done by uploading the data to DataStore.

Here is an example of using datastore:

    import pandas as pd
    from sklearn import datasets

    data_train = datasets.load_digits()

    pd.DataFrame([100:,:]).to_csv("data/X_train.csv", index=False)
    pd.DataFrame([100:]).to_csv("data/y_train.csv", index=False)

    ds = ws.get_default_datastore()
    ds.upload(src_dir='./data', target_path='digitsdata', overwrite=True, show_progress=True)

Define dprep references

Define X and y as dprep reference, which will be passed to automated machine learning AutoMLConfig object similar to below:

    X = dprep.auto_read_file(path=ds.path('digitsdata/X_train.csv'))
    y = dprep.auto_read_file(path=ds.path('digitsdata/y_train.csv'))

    automl_config = AutoMLConfig(task = 'classification',
                                 debug_log = 'automl_errors.log',
                                 path = project_folder,
                                 X = X,
                                 y = y,

Train and validation data

You can specify separate train and validation set directly in the AutoMLConfig method.

K-Folds Cross Validation

Use n_cross_validations setting to specify the number of cross validations. The training data set will be randomly split into n_cross_validations folds of equal size. During each cross validation round, one of the folds will be used for validation of the model trained on the remaining folds. This process repeats for n_cross_validations rounds until each fold is used once as validation set. The average scores across all n_cross_validations rounds will be reported, and the corresponding model will be retrained on the whole training data set.

Monte Carlo Cross Validation (Repeated Random Sub-Sampling)

Use validation_size to specify the percentage of the training dataset that should be used for validation, and use n_cross_validations to specify the number of cross validations. During each cross validation round, a subset of size validation_size will be randomly selected for validation of the model trained on the remaining data. Finally, the average scores across all n_cross_validations rounds will be reported, and the corresponding model will be retrained on the whole training data set. Monte Carlo is not supported for time series forecasting.

Custom validation dataset

Use custom validation dataset if random split is not acceptable, usually time series data or imbalanced data. You can specify your own validation dataset. The model will be evaluated against the validation dataset specified instead of random dataset.

Compute to run experiment

Next determine where the model will be trained. An automated machine learning training experiment can run on the following compute options:

  • Your local machine such as a local desktop or laptop – Generally when you have small dataset and you are still in the exploration stage.
  • A remote machine in the cloud – Azure Machine Learning Managed Compute is a managed service that enables the ability to train machine learning models on clusters of Azure virtual machines.

See the GitHub site for example notebooks with local and remote compute targets.

See the GitHub site for example notebooks with Azure Databricks.

Configure your experiment settings

There are several options that you can use to configure your automated machine learning experiment. These parameters are set by instantiating an AutoMLConfig object. See the AutoMLConfig class for a full list of parameters.

Some examples include:

  1. Classification experiment using AUC weighted as the primary metric with a max time of 12,000 seconds per iteration, with the experiment to end after 50 iterations and 2 cross validation folds.

    automl_classifier = AutoMLConfig(
  2. Below is an example of a regression experiment set to end after 100 iterations, with each iteration lasting up to 600 seconds with 5 validation cross folds.

    automl_regressor = AutoMLConfig(
        whitelist_models='kNN regressor'

The three different task parameter values determine the list of algorithms to apply. Use the whitelist or blacklist parameters to further modify iterations with the available algorithms to include or exclude. The list of supported models can be found on SupportedAlgorithms Class.

Primary Metric

The primary metric; as shown in the examples above determines the metric to be used during model training for optimization. The primary metric you can select is determined by the task type you choose. Below is a list of available metrics.

Classification Regression Time Series Forecasting
accuracy spearman_correlation spearman_correlation
AUC_weighted normalized_root_mean_squared_error normalized_root_mean_squared_error
average_precision_score_weighted r2_score r2_score
norm_macro_recall normalized_mean_absolute_error normalized_mean_absolute_error

Data preprocessing & featurization

In every automated machine learning experiment, your data is automatically scaled and normalized to help algorithms perform well. However, you can also enable additional preprocessing/featurization, such as missing values imputation, encoding, and transforms. Learn more about what featurization is included.

To enable this featurization, specify "preprocess": True for the AutoMLConfig class.


Automated machine learning pre-processing steps (feature normalization, handling missing data, converting text to numeric, etc.) become part of the underlying model. When using the model for predictions, the same pre-processing steps applied during training are applied to your input data automatically.

Time Series Forecasting

For time series forecasting task type you have additional parameters to define.

  1. time_column_name - This is a required parameter which defines the name of the column in your training data containing date/time series.
  2. max_horizon - This defines the length of time you want to predict out based on the periodicity of the training data. For example if you have training data with daily time grains, you define how far out in days you want the model to train for.
  3. grain_column_names - This defines the name of columns which contain individual time series data in your training data. For example, if you are forecasting sales of a particular brand by store, you would define store and brand columns as your grain columns.

See example of these settings being used below, notebook example is available here.

# Setting Store and Brand as grains for training.
grain_column_names = ['Store', 'Brand']
nseries = data.groupby(grain_column_names).ngroups

# View the number of time series data with defined grains
print('Data contains {0} individual time-series.'.format(nseries))
time_series_settings = {
    'time_column_name': time_column_name,
    'grain_column_names': grain_column_names,
    'drop_column_names': ['logQuantity'],
    'max_horizon': n_test_periods

automl_config = AutoMLConfig(task='forecasting',

Ensemble configuration

Ensemble models are enabled by default, and appear as the final run iterations in an automated machine learning run. Currently supported ensemble methods are voting and stacking. Voting is implemented as soft-voting using weighted averages, and the stacking implementation is using a 2 layer implementation, where the first layer has the same models as the voting ensemble, and the second layer model is used to find the optimal combination of the models from the first layer. If you are using ONNX models, or have model-explainability enabled, stacking will be disabled and only voting will be utilized.

There are multiple default arguments that can be provided as kwargs in an AutoMLConfig object to alter the default stack ensemble behavior.

  • stack_meta_learner_type: the meta-learner is a model trained on the output of the individual heterogenous models. Default meta-learners are LogisticRegression for classification tasks (or LogisticRegressionCV if cross-validation is enabled) and ElasticNet for regression/forecasting tasks (or ElasticNetCV if cross-validation is enabled). This parameter can be one of the following strings: LogisticRegression, LogisticRegressionCV, LightGBMClassifier, ElasticNet, ElasticNetCV, LightGBMRegressor, or LinearRegression.
  • stack_meta_learner_train_percentage: specifies the proportion of the training set (when choosing train and validation type of training) to be reserved for training the meta-learner. Default value is 0.2.
  • stack_meta_learner_kwargs: optional parameters to pass to the initializer of the meta-learner. These parameters and parameter types mirror those from the corresponding model constructor, and are forwarded to the model constructor.

The following code shows an example of specifying custom ensemble behavior in an AutoMLConfig object.

ensemble_settings = {
    "stack_meta_learner_type": "LogisticRegressionCV",
    "stack_meta_learner_train_percentage": 0.3,
    "stack_meta_learner_kwargs": {
        "refit": True,
        "fit_intercept": False,
        "class_weight": "balanced",
        "multi_class": "auto",
        "n_jobs": -1

automl_classifier = AutoMLConfig(

Ensemble training is enabled by default, but it can be disabled by using the enable_voting_ensemble and enable_stack_ensemble boolean parameters.

automl_classifier = AutoMLConfig(

Run experiment

For automated ML you create an Experiment object, which is a named object in a Workspace used to run experiments.

from azureml.core.experiment import Experiment

ws = Workspace.from_config()

# Choose a name for the experiment and specify the project folder.
experiment_name = 'automl-classification'
project_folder = './sample_projects/automl-classification'

experiment = Experiment(ws, experiment_name)

Submit the experiment to run and generate a model. Pass the AutoMLConfig to the submit method to generate the model.

run = experiment.submit(automl_config, show_output=True)


Dependencies are first installed on a new machine. It may take up to 10 minutes before output is shown. Setting show_output to True results in output being shown on the console.

Exit Criteria

There a few options you can define to complete your experiment.

  1. No Criteria - If you do not define any exit parameters the experiment will continue until no further progress is made on your primary metric.
  2. Number of iterations - You define the number of iterations for the experiment to run. You can optional add iteration_timeout_minutes to define a time limit in minutes per each iteration.
  3. Exit after a length of time - Using experiment_timeout_minutes in your settings you can define how long in minutes should an experiment continue in run.
  4. Exit after a score has been reached - Using experiment_exit_score you can choose to complete the experiment after a score based on your primary metric has been reached.

Explore model metrics

You can view your training results in a widget or inline if you are in a notebook. See Track and evaluate models for more details.

Understand automated ML models

Any model produced using automated ML includes the following steps:

  • Automated feature engineering (if preprocess=True)
  • Scaling/Normalization and algorithm with hypermeter values

We make it transparent to get this information from the fitted_model output from automated ML.

automl_config = AutoMLConfig(…)
automl_run = experiment.submit(automl_config …)
best_run, fitted_model = automl_run.get_output()

Automated feature engineering

See the list of preprocessing and automated feature engineering that happens when preprocess=True.

Consider this example:

  • There are 4 input features: A (Numeric), B (Numeric), C (Numeric), D (DateTime)
  • Numeric feature C is dropped because it is an ID column with all unique values
  • Numeric features A and B have missing values and hence are imputed by mean
  • DateTime feature D is featurized into 11 different engineered features

Use these 2 APIs on the first step of fitted model to understand more. See this sample notebook.

  • API 1: get_engineered_feature_names() returns a list of engineered feature names.


    fitted_model.named_steps['timeseriestransformer']. get_engineered_feature_names ()
    Output: ['A', 'B', 'A_WASNULL', 'B_WASNULL', 'year', 'half', 'quarter', 'month', 'day', 'hour', 'am_pm', 'hour12', 'wday', 'qday', 'week']

    This list includes all engineered feature names.


    Use 'timeseriestransformer' for task=’forecasting’, else use 'datatransformer' for ‘regression’ or ‘classification’ task.

  • API 2: get_featurization_summary() returns featurization summary for all the input features.




    Use 'timeseriestransformer' for task=’forecasting’, else use 'datatransformer' for ‘regression’ or ‘classification’ task.


    [{'RawFeatureName': 'A',
      'TypeDetected': 'Numeric',
      'Dropped': 'No',
      'EngineeredFeatureCount': 2,
      'Tranformations': ['MeanImputer', 'ImputationMarker']},
    {'RawFeatureName': 'B',
      'TypeDetected': 'Numeric',
      'Dropped': 'No',
      'EngineeredFeatureCount': 2,
      'Tranformations': ['MeanImputer', 'ImputationMarker']},
    {'RawFeatureName': 'C',
      'TypeDetected': 'Numeric',
      'Dropped': 'Yes',
      'EngineeredFeatureCount': 0,
      'Tranformations': []},
    {'RawFeatureName': 'D',
      'TypeDetected': 'DateTime',
      'Dropped': 'No',
      'EngineeredFeatureCount': 11,
      'Tranformations': ['DateTime','DateTime','DateTime','DateTime','DateTime','DateTime','DateTime','DateTime','DateTime','DateTime','DateTime']}]


    Output Definition
    RawFeatureName Input feature/column name from the dataset provided.
    TypeDetected Detected datatype of the input feature.
    Dropped Indicates if the input feature was dropped or used.
    EngineeringFeatureCount Number of features generated through automated feature engineering transforms.
    Transformations List of transformations applied to input features to generate engineered features.

Scaling/Normalization and algorithm with hypermeter values:

To understand the scaling/normalization and algorithm/hyperparameter values for a pipeline, use fitted_model.steps. Learn more about scaling/normalization. Here is a sample output:

[('RobustScaler', RobustScaler(copy=True, quantile_range=[10, 90], with_centering=True, with_scaling=True)), ('LogisticRegression', LogisticRegression(C=0.18420699693267145, class_weight='balanced', dual=False, fit_intercept=True, intercept_scaling=1, max_iter=100, multi_class='multinomial', n_jobs=1, penalty='l2', random_state=None, solver='newton-cg', tol=0.0001, verbose=0, warm_start=False))

To get more details, use this helper function shown in this sample notebook.

from pprint import pprint

def print_model(model, prefix=""):
    for step in model.steps:
        print(prefix + step[0])
        if hasattr(step[1], 'estimators') and hasattr(step[1], 'weights'):
            pprint({'estimators': list(
                e[0] for e in step[1].estimators), 'weights': step[1].weights})
            for estimator in step[1].estimators:
                print_model(estimator[1], estimator[0] + ' - ')


The following is sample output for a pipeline using a specific algorithm (LogisticRegression with RobustScalar, in this case).

{'copy': True,
'quantile_range': [10, 90],
'with_centering': True,
'with_scaling': True}

{'C': 0.18420699693267145,
'class_weight': 'balanced',
'dual': False,
'fit_intercept': True,
'intercept_scaling': 1,
'max_iter': 100,
'multi_class': 'multinomial',
'n_jobs': 1,
'penalty': 'l2',
'random_state': None,
'solver': 'newton-cg',
'tol': 0.0001,
'verbose': 0,
'warm_start': False}

Explain the model (interpretability)

Automated machine learning allows you to understand feature importance. During the training process, you can get global feature importance for the model. For classification scenarios, you can also get class-level feature importance. You must provide a validation dataset (X_valid) to get feature importance.

There are two ways to generate feature importance.

  • Once an experiment is complete, you can use explain_model method on any iteration.

    from azureml.train.automl.automlexplainer import explain_model
    shap_values, expected_values, overall_summary, overall_imp, per_class_summary, per_class_imp = \
        explain_model(fitted_model, X_train, X_test)
    #Overall feature importance
    #Class-level feature importance
  • To view feature importance for all iterations, set model_explainability flag to True in AutoMLConfig.

    automl_config = AutoMLConfig(task = 'classification',
                                 debug_log = 'automl_errors.log',
                                 primary_metric = 'AUC_weighted',
                                 max_time_sec = 12000,
                                 iterations = 10,
                                 verbosity = logging.INFO,
                                 X = X_train,
                                 y = y_train,
                                 X_valid = X_test,
                                 y_valid = y_test,

    Once done, you can use retrieve_model_explanation method to retrieve feature importance for a specific iteration.

    from azureml.train.automl.automlexplainer import retrieve_model_explanation
    shap_values, expected_values, overall_summary, overall_imp, per_class_summary, per_class_imp = \
    #Overall feature importance
    #Class-level feature importance

You can visualize the feature importance chart in your workspace in the Azure portal. Display the URL using the run object:


You can visualize the feature importance chart in your workspace in the Azure portal. The chart is also shown when using the RunDetails Jupyter widget in a notebook. To learn more about the charts refer to Understand automated machine learning results.

from azureml.widgets import RunDetails

feature importance graph

For more information on how model explanations and feature importance can be enabled in other areas of the SDK outside of automated machine learning, see the concept article on interpretability.

Next steps

Learn more about how and where to deploy a model.

Learn more about how to train a regression model with Automated machine learning or how to train using Automated machine learning on a remote resource.