AutoMLConfig class

Definition

Configuration for submitting an Automated Machine Learning experiment in Azure Machine Learning service.

This configuration object contains and persists the parameters for configuring the experiment run parameters, as well as the training data to be used at run time. For guidance on selecting your settings, you may refer to https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-configure-auto-train. The following code shows a basic example of creating an AutoMLConfig object, and submitting an experiment with the defined configuration:


   from azureml.core.experiment import Experiment
   from azureml.core.workspace import Workspace
   from azureml.train.automl import AutoMLConfig

   automated_ml_config = AutoMLConfig(task = 'regression',
                            X = your_training_features,
                            y = your_training_labels,
                            iterations=30,
                            iteration_timeout_minutes=5,
                            primary_metric="spearman_correlation")

   ws = Workspace.from_config()
   experiment = Experiment(ws, "your-experiment-name")
   run = experiment.submit(automated_ml_config, show_output=True)
AutoMLConfig(task: str, path: typing.Union[str, NoneType] = None, iterations: typing.Union[str, NoneType] = None, data_script: typing.Union[str, NoneType] = None, primary_metric: typing.Union[str, NoneType] = None, compute_target: typing.Union[typing.Any, NoneType] = None, spark_context: typing.Union[typing.Any, NoneType] = None, X: typing.Union[typing.Any, NoneType] = None, y: typing.Union[typing.Any, NoneType] = None, sample_weight: typing.Union[typing.Any, NoneType] = None, X_valid: typing.Union[typing.Any, NoneType] = None, y_valid: typing.Union[typing.Any, NoneType] = None, sample_weight_valid: typing.Union[typing.Any, NoneType] = None, cv_splits_indices: typing.Union[typing.Any, NoneType] = None, validation_size: typing.Union[float, NoneType] = None, n_cross_validations: typing.Union[int, NoneType] = None, y_min: typing.Union[float, NoneType] = None, y_max: typing.Union[float, NoneType] = None, num_classes: typing.Union[int, NoneType] = None, preprocess: bool = False, lag_length: int = 0, max_cores_per_iteration: int = 1, max_concurrent_iterations: int = 1, iteration_timeout_minutes: typing.Union[int, NoneType] = None, mem_in_mb: typing.Union[int, NoneType] = None, enforce_time_on_windows: bool = True, experiment_timeout_minutes: typing.Union[int, NoneType] = None, experiment_exit_score: typing.Union[float, NoneType] = None, enable_early_stopping: bool = False, blacklist_models: typing.Union[typing.List[str], NoneType] = None, auto_blacklist: bool = True, exclude_nan_labels: bool = True, verbosity: int = 20, enable_tf: bool = False, enable_cache: bool = True, cost_mode: int = 0, whitelist_models: typing.Union[typing.List[str], NoneType] = None, enable_onnx_compatible_models: bool = False, enable_voting_ensemble: bool = True, enable_stack_ensemble: bool = True, debug_log: str = 'automl.log', **kwargs: typing.Any) -> None
Inheritance
builtins.object
AutoMLConfig

Parameters

task
str or Tasks

'classification', 'regression', or 'forecasting' depending on what kind of ML problem to solve.

path
str

Full path to the Azure Machine Learning project folder.

iterations
int

Total number of different algorithm and parameter combinations to test during an Automated Machine Learning experiment.

data_script
str

File path to the user authored script containing get_data() function.

primary_metric
str or azureml.train.automl.constants.Metric

The metric that Automated Machine Learning will optimize for model selection. Automated Machine Learning collects more metrics than it can optimize. You may use azureml.train.automl.utilities.get_primary_metrics(task) to get a list of valid metrics for your given task. You may reference https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-configure-auto-train for details on how these metrics are calculated.

compute_target
azureml.core.compute.AbstractComputeTarget

The Azure Machine Learning compute target to run the Automated Machine Learning experiment on. See https://docs.microsoft.com/azure/machine-learning/service/how-to-auto-train-remote for more information on compute targets.

spark_context
SparkContext

Spark context, only applicable when used inside Azure Databricks/Spark environment.

X
pandas.DataFrame or numpy.ndarray or Dataflow or Dataset or DatasetDefinition

The training features to use when fitting pipelines during an experiment.

y
pandas.DataFrame or numpy.ndarray or Dataflow or Dataset or DatasetDefinition

Training labels to use when fitting pipelines during an experiment. This is the value your model will predict.

sample_weight
pandas.DataFrame or numpy.ndarray or Dataflow

The weight to give to each training sample when running fitting pipelines, each row should correspond to a row in X and y data.

X_valid
pandas.DataFrame or numpy.ndarray or Dataflow or Dataset or DatasetDefinition

validation features to use when fitting pipelines during an experiment.

y_valid
pandas.DataFrame or numpy.ndarray or Dataflow or Dataset or DatasetDefinition

validation labels to use when fitting pipelines during an experiment.

sample_weight_valid
pandas.DataFrame or numpy.ndarray or Dataflow

The weight to give to each validation sample when running scoring pipelines, each row should correspond to a row in X and y data.

cv_splits_indices
numpy.ndarray

Indices where to split training data for cross validation. Each row is a separate cross fold and within each crossfold, provide 2 arrays, t he first with the indices for samples to use for training data and the second with the indices to use for validation data. i.e [[t1, v1], [t2, v2], ...] where t1 is the training indices for the first cross fold and v1 is the validation indices for the first cross fold.

validation_size
float

What percent of the data to hold out for validation when user validation data is not specified.

n_cross_validations
int

How many cross validations to perform when user validation data is not specified.

y_min
float

Minimum value of y for a regression experiment.

y_max
float

Maximum value of y for a regression experiment.

num_classes
int

Number of classes in the label data for a classification experiment.

preprocess
bool

Flag whether Automated Machine Learning should preprocess your data for you such as handling missing data, text data and other common feature extraction. Note: If input data is Sparse you cannot use preprocess as True.

lag_length
int

How many rows of historical data to include when preprocessing time series data.

max_cores_per_iteration
int

Maximum number of threads to use for a given training iteration.

max_concurrent_iterations
int

Maximum number of iterations that would be executed in parallel. This should be less than the number of cores on the compute target.

iteration_timeout_minutes
int

Maximum time in minutes that each iteration can run for before it terminates.

mem_in_mb
int

Maximum memory usage that each iteration can run for before it terminates.

enforce_time_on_windows
bool

Flag to enforce time limit on model training at each iteration under windows. If running from a python script file (.py) please refer to the documentation for allowing resource limits on windows.

experiment_timeout_minutes
int

Maximum amount of time in minutes that all iterations combined can take before the experiment terminates.

experiment_exit_score
int

Target score for experiment. Experiment will terminate after this score is reached.

enable_early_stopping
bool

Flag to enble early termination if the score is not improving in the short term.

blacklist_models
list(str) or list(SupportedAlgorithms)

List of algorithms to ignore for an experiment.

exclude_nan_labels
bool

Flag whether to exclude rows with NaN values in the label.

auto_blacklist
bool

Flag whether Automated Machine Learning should try to automatically exclude algorithms that it thinks won't perform well or may take a disproportionally long time to train.

verbosity
int

Verbosity level for log file.

enable_tf
bool

Flag to enable/disable Tensorflow algorithms

enable_cache
bool

Flag to enable/disable disk cache for transformed, preprocessed data.

cost_mode
int or automl.client.core.common.constants.PipelineCost

Flag to set cost prediction modes. COST_NONE stands for none cost prediction, COST_FILTER stands for cost prediction per iteration.

whitelist_models
list(str) or list(SupportedAlgorithms)

List of model names to search for an experiment.

enable_onnx_compatible_models
bool

Flag to enable/disable enforcing the onnx compatible models.

time_column_name
str

The name of your time column.

max_horizon
int

The number of periods out you would like to predict past your training data. Periods are inferred from your data.

grain_column_names
List[str]

The names of columns used to group your timeseries. It can be used to create multiple series.

drop_column_names
List[str]

The names of columns to drop.

target_lags
int

The number of past periods to lag from the target column.

target_rolling_window_size
int

The number of past periods used to create a rolling window average of the target column.

country
str

The country used to generate holiday features. These should be ISO 3166 two-letter country codes (i.e. 'US', 'GB').

enable_voting_ensemble
bool

Flag to enable/disable VotingEnsemble iteration.

enable_stack_ensemble
bool

Flag to enable/disable StackEnsemble iteration.

debug_log
str

Log file to write debug information to.