AutoMLStep Class

Creates an Azure ML Pipeline step that encapsulates an automated ML run.

For an example of using AutoMLStep, see the notebook https://aka.ms/pl-automl.

Initialize an AutoMLStep.

Inheritance
AutoMLStep

Constructor

AutoMLStep(name, automl_config, inputs=None, outputs=None, script_repl_params=None, allow_reuse=True, version=None, hash_paths=None, enable_default_model_output=True, enable_default_metrics_output=True, **kwargs)

Parameters

name
str
Required

The name of the step.

automl_config
AutoMLConfig
Required

An AutoMLConfig object that defines the configuration for this AutoML run.

inputs
list[Union[InputPortBinding, DataReference, PortDataReference, PipelineData]]
default value: None

A list of input port bindings.

outputs
list[Union[PipelineData, OutputPortBinding]]
default value: None

A list of output port bindings.

script_repl_params
dict
default value: None

Optional parameters to be replaced in a script, for example {'param1': 'value1', 'param2': 'value2'}.

allow_reuse
bool
default value: True

Indicates whether the step should reuse previous results when re-run with the same settings.

Reuse is enabled by default. If the step contents (scripts/dependencies) as well as inputs and parameters remain unchanged, the output from the previous run of this step is reused. When reusing the step, instead of submitting the job to compute, the results from the previous run are immediately made available to any subsequent steps. If you use Azure Machine Learning datasets as inputs, reuse is determined by whether the dataset's definition has changed, not by whether the underlying data has changed.

version
str
default value: None

A version to assign to the step.

hash_paths
list
default value: None

DEPRECATED. A list of paths to hash when checking for changes to the pipeline step contents.

By default, all files under the path parameter in AutoMLConfig are hashed except files listed in .amlignore or .gitignore under path. If there are no changes detected, the pipeline reuses the step contents from a previous run.

enable_default_model_output
bool
default value: True

Indicates whether or not the best model will be added as a default output. This can be used to retrieve the best model after the run has completed using the AutoMLStepRun class. Note, if the default model output is not required, it is recommended to set this parameter to False.

enable_default_metrics_output
bool
default value: True

Indicates whether or not all child run metrics will be added as a default output. This can be used to retrieve the child run metrics after the run has completed using the AutoMLStepRun class. Note, if the default metrics output is not required, it is recommended to set this parameter to False.

name
str
Required

The name of the step.

automl_config
AutoMLConfig
Required

An AutoMLConfig that defines the configuration for this AutoML run.

inputs
list[Union[InputPortBinding, DataReference, PortDataReference, PipelineData]]
Required

A list of input port bindings.

outputs
list[Union[PipelineData, OutputPortBinding]]
Required

A list of output port bindings.

script_repl_params
dict
Required

Optional parameters to be replaced in a script, for example {'param1': 'value1', 'param2': 'value2'}.

script_repl_params
Required

Optional parameters to be replaced in a script.

allow_reuse
bool
Required

Indicates whether the step should reuse previous results when re-run with the same settings.

Reuse is enabled by default. If the step contents (scripts/dependencies) as well as inputs and parameters remain unchanged, the output from the previous run of this step is reused. When reusing the step, instead of submitting the job to compute, the results from the previous run are immediately made available to any subsequent steps. If you use Azure Machine Learning datasets as inputs, reuse is determined by whether the dataset's definition has changed, not by whether the underlying data has changed.

version
str
Required

A version to assign to the step.

hash_paths
list
Required

DEPRECATED. A list of paths to hash when checking for changes to the pipeline step contents.

By default, all files under the path parameter in AutoMLConfig are hashed except files listed in .amlignore or .gitignore under path. If there are no changes detected, the pipeline reuses the step contents from a previous run.

enable_default_model_output
bool
Required

Indicates whether or not the best model will be added as a default output. This can be used to retrieve the best model after the run has completed using the AutoMLStepRun class. Note, if the default model output is not required, it is recommended to set this parameter to False.

enable_default_metrics_output
bool
Required

Indicates whether or not all child run metrics will be added as a default output. This can be used to retrieve the child run metrics after the run has completed using the AutoMLStepRun class. Note, if the default metrics output is not required, it is recommended to set this parameter to False.

Remarks

With the AutoMLStep class you can run your automated ML workflow in an Azure Machine Learning pipeline. Pipelines provide benefits such as repeatability, unattended runs, versioning and tracking, and modularity for your automated ML workflow. For more informaton, see What are Azure Machine Learning pipelines?.

When your automated ML workflow is in a pipeline, you can schedule the pipeline to run on a time-based schedule or on a change-based schedule. Time-based schedules are useful for routine tasks such as monitoring data drift, while change-based schedules are useful for irregular or unpredictable changes such as when data changes. For example, your schedule might poll a blob store where the data is being uploaded and then run the pipeline again if data changes and then register new version of the model once the run is complete. For more information, see Schedule machine learning pipelines and Trigger a run of a Machine Learning pipeline from a Logic App.

The following example shows how to create an AutoMLStep.


   automl_step = AutoMLStep(
       name='automl_module',
       automl_config=automl_config,
       outputs=[metrics_data, model_data],
       allow_reuse=True)

Full sample is available from https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-with-automated-machine-learning-step.ipynb

The following example show how to use the AutoMLStep object in a Pipeline.


   from azureml.pipeline.core import Pipeline
   pipeline = Pipeline(
       description="pipeline_with_automlstep",
       workspace=ws,
       steps=[automl_step])

Full sample is available from https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-with-automated-machine-learning-step.ipynb

The above example shows one step in the pipeline. However, when using AutoMLStep in a real-world automated ML workflow, you will have a least one pipeline step that performs data preparation before the AutoMLStep, and another pipeline step after that registers the model. For example of this type of workflow, see the notebook https://aka.ms/automl-retrain-pipeline.

To manage, check status, and get run details from the pipeline run, use the AutoMLStepRun class.

For more information about automated machine learning in Azure, see the article What is automated machine learning?. For more information about setting up an automated ML experiment without using a pipeline, see the article Configure automated ML experiment in Python.

Methods

create_node

Create a node from this AutoML step and add to the given graph.

This method is not intended to be used directly. When a pipeline is instantiated with this step, Azure ML automatically passes the parameters required through this method so that step can be added to a pipeline graph that represents the workflow.

create_node

Create a node from this AutoML step and add to the given graph.

This method is not intended to be used directly. When a pipeline is instantiated with this step, Azure ML automatically passes the parameters required through this method so that step can be added to a pipeline graph that represents the workflow.

create_node(graph, default_datastore, context)

Parameters

graph
Graph
Required

The graph object to add the node to.

default_datastore
Union[AbstractAzureStorageDatastore, AzureDataLakeDatastore]
Required

The default datastore.

context
<xref:azureml.pipeline.core._GraphContext>
Required

The graph context.

Returns

The created node.

Return type

Attributes

AUTOML_CONFIG_PARAM_NAME

AUTOML_CONFIG_PARAM_NAME = 'AutoMLConfig'

DEFAULT_METRIC_PREFIX

DEFAULT_METRIC_PREFIX = 'default_metrics_'

DEFAULT_MODEL_PREFIX

DEFAULT_MODEL_PREFIX = 'default_model_'