EstimatorStep class

Definition

Creates an Azure ML Pipeline step to run Estimator for Machine Learning model training.

For an example of using EstimatorStep, see the notebook https://aka.ms/pl-estimator.

EstimatorStep(name=None, estimator=None, estimator_entry_script_arguments=None, runconfig_pipeline_params=None, inputs=None, outputs=None, compute_target=None, allow_reuse=True, version=None)
Inheritance
azureml.pipeline.core._python_script_step_base._PythonScriptStepBase
EstimatorStep

Parameters

name
str

The name of the step.

estimator
Estimator

The associated estimator object for this step. Can be a pre-configured estimator such as Chainer, PyTorch, TensorFlow, or SKLearn.

estimator_entry_script_arguments
list[str]

[Required] A list of command-line arguments. If the Estimator's entry script does not accept commandline arguments, set this parameter value to an empty list.

runconfig_pipeline_params
dict({(str) : (PipelineParameter)})

An override of runconfig properties at runtime using key-value pairs, each with name of the runconfig property and PipelineParameter for that property.

inputs
list[PipelineData or azureml.pipeline.core.pipeline_output_dataset.PipelineOutputDataset or DataReference or Dataset or DatasetDefinition or DatasetConsumptionConfig or PipelineDataset]

A list of inputs to use.

outputs
list[PipelineData or azureml.pipeline.core.pipeline_output_dataset.PipelineOutputDataset]

A list of PipelineData objects.

compute_target
DsvmCompute or AmlCompute or RemoteCompute or str

[Required] The compute target to use.

allow_reuse
bool

Indicates whether the step should reuse previous results when re-run with the same settings. Reuse is enabled by default. If the step contents (scripts/dependencies) as well as inputs and parameters remain unchanged, the output from the previous run of this step is reused. When reusing the step, instead of submitting the job to compute, the results from the previous run are immediately made available to any subsequent steps. If you use Azure Machine Learning datasets as inputs, reuse is determined by whether the dataset's definition has changed, not by whether the underlying data has changed.

version
str

An optional version tag to denote a change in functionality for the module.

Remarks

Note that the arguments to the entry script used in the Estimator object must be specified as list using the estimator_entry_script_arguments parameter when instantiating an EstimatorStep. The Estimator parameter script_params accepts a dictionary. However, estimator_entry_script_argument parameter expects arguments as a list.

The EstimatorStep initialization involves specifying a list of DataReference objects with the inputs parameter. In Azure ML Pipelines, a pipeline step can take another step's output or DataReference objects as input. Therefore, when creating an EstimatorStep, the inputs and outputs parameters must be set explicitly, which overrides inputs parameter specified in the Estimator object.

The best practice for working with EstimatorStep is to use a separate folder for scripts and any dependent files associated with the step, and specify that folder as the Estimator object's source_directory. Doing so has two benefits. First, it helps reduce the size of the snapshot created for the step because only what is needed for the step is snapshotted. Second, the step's output from a previous run can be reused if there are no changes to the source_directory that would trigger a re-upload of the snaphot.

The following example shows how to use EstimatorStep in an Azure Machine Learning Pipeline.


   from azureml.pipeline.steps import EstimatorStep

   est_step = EstimatorStep(name="Estimator_Train",
                            estimator=est,
                            estimator_entry_script_arguments=["--datadir", input_data, "--output", output],
                            runconfig_pipeline_params=None,
                            inputs=[input_data],
                            outputs=[output],
                            compute_target=cpu_cluster)

Full sample is available from https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-how-to-use-estimatorstep.ipynb

Methods

create_node(graph, default_datastore, context)

Create a node from the Estimator step and add it to the specified graph.

This method is not intended to be used directly. When a pipeline is instantiated with this step, Azure ML automatically passes the parameters required through this method so that step can be added to a pipeline graph that represents the workflow.

create_node(graph, default_datastore, context)

Create a node from the Estimator step and add it to the specified graph.

This method is not intended to be used directly. When a pipeline is instantiated with this step, Azure ML automatically passes the parameters required through this method so that step can be added to a pipeline graph that represents the workflow.

create_node(graph, default_datastore, context)

Parameters

graph
Graph

The graph object to add the node to.

default_datastore
AbstractAzureStorageDatastore or AzureDataLakeDatastore

The default datastore.

context
_GraphContext

The graph context.

Returns

The created node.

Return type