ModuleStep Class

Creates an Azure Machine Learning pipeline step to run a specific version of a Module.

Module objects define reusable computations, such as scripts or executables, that can be used in different machine learning scenarios and by different users. To use a specific version of a Module in a pipeline create a ModuleStep. A ModuleStep is a step in pipeline that uses an existing ModuleVersion.

For an example of using ModuleStep, see the notebook https://aka.ms/pl-modulestep.

Create an Azure ML pipeline step to run a specific version of a Module.

Inheritance
ModuleStep

Constructor

ModuleStep(module=None, version=None, module_version=None, inputs_map=None, outputs_map=None, compute_target=None, runconfig=None, runconfig_pipeline_params=None, arguments=None, params=None, name=None, _workflow_provider=None)

Parameters

module
Module
default value: None

The module used in the step. Provide either the module or the module_version parameter but not both.

version
str
default value: None

The version of the module used in the step.

module_version
ModuleVersion
default value: None

A ModuleVersion of the module used in the step. Provide either the module or the module_version parameter but not both.

inputs_map
dict[str, Union[InputPortBinding, DataReference, PortDataReference, PipelineData, PipelineOutputAbstractDataset, DatasetConsumptionConfig]]
default value: None

A dictionary that maps the names of port definitions of the ModuleVersion to the step's inputs.

outputs_map
dict[str, Union[OutputPortBinding, DataReference, PortDataReference, PipelineData, PipelineOutputAbstractDataset]]
default value: None

A dictionary that maps the names of port definitions of the ModuleVersion to the step's outputs.

compute_target
Union[DsvmCompute, AmlCompute, RemoteCompute, HDInsightCompute, str, tuple]
default value: None

The compute target to use. If unspecified, the target from the runconfig will be used. May be a compute target object or the string name of a compute target on the workspace. Optionally, if the compute target is not available at pipeline creation time, you may specify a tuple of ('compute target name', 'compute target type') to avoid fetching the compute target object (AmlCompute type is 'AmlCompute' and RemoteCompute type is 'VirtualMachine').

runconfig
RunConfiguration
default value: None

An optional RunConfiguration to use. A RunConfiguration can be used to specify additional requirements for the run, such as conda dependencies and a Docker image.

runconfig_pipeline_params
dict[str, PipelineParameter]
default value: None

An override of runconfig properties at runtime using key-value pairs each with name of the runconfig property and PipelineParameter for that property.

Supported values: 'NodeCount', 'MpiProcessCountPerNode', 'TensorflowWorkerCount', 'TensorflowParameterServerCount'

arguments
list[str]
default value: None

A list of command line arguments for the Python script file. The arguments will be delivered to the compute target via arguments in RunConfiguration. For more details how to handle arguments such as special symbols, see the arguments in RunConfiguration

params
dict[str, str]
default value: None

A dictionary of name-value pairs.

name
str
default value: None

The name of the step.

_workflow_provider
<xref:azureml.pipeline.core._aeva_provider._AevaWorkflowProvider>
default value: None

(Internal use only.) The workflow provider.

module
Module
Required

The module used in the step. Provide either the module or the module_version parameter but not both.

version
str
Required

The version of the module used in the step.

module_version
ModuleVersion
Required

The ModuleVersion of the module used in the step. Provide either the module or the module_version parameter but not both.

inputs_map
dict[str, Union[InputPortBinding, DataReference, PortDataReference, PipelineData, <xref:azureml.pipeline.core.pipeline_output_dataset.PipelineOutputDataset>, DatasetConsumptionConfig]]
Required

A dictionary that maps the names of port definitions of the ModuleVersion to the step's inputs.

outputs_map
dict[str, Union[InputPortBinding, DataReference, PortDataReference, PipelineData, <xref:azureml.pipeline.core.pipeline_output_dataset.PipelineOutputDataset>]]
Required

A dictionary that maps the names of port definitions of the ModuleVersion to the step's outputs.

compute_target
Union[DsvmCompute, AmlCompute, RemoteCompute, HDInsightCompute, str, tuple]
Required

The compute target to use. If unspecified, the target from the runconfig will be used. May be a compute target object or the string name of a compute target on the workspace. Optionally, if the compute target is not available at pipeline creation time, you may specify a tuple of ('compute target name', 'compute target type') to avoid fetching the compute target object (AmlCompute type is 'AmlCompute' and RemoteCompute type is 'VirtualMachine').

runconfig
RunConfiguration
Required

An optional RunConfiguration to use. A RunConfiguration can be used to specify additional requirements for the run, such as conda dependencies and a Docker image.

runconfig_pipeline_params
dict[str, PipelineParameter]
Required

An override of runconfig properties at runtime using key-value pairs each with name of the runconfig property and PipelineParameter for that property.

Supported values: 'NodeCount', 'MpiProcessCountPerNode', 'TensorflowWorkerCount', 'TensorflowParameterServerCount'

arguments
list[str]
Required

A list of command line arguments for the Python script file. The arguments will be delivered to the compute target via arguments in RunConfiguration. For more details how to handle arguments such as special symbols, see the arguments in RunConfiguration

params
dict[str, str]
Required

A dictionary of name-value pairs.

name
str
Required

The name of the step.

_wokflow_provider
Required

(Internal use only.) The workflow provider.

Remarks

A Module is used to create and manage a resusable computational unit of an Azure Machine Learning pipeline. ModuleStep is the built-in step in Azure Machine Learning used to consume a module. You can either define specifically which ModuleVersion to use or let Azure Machine Learning resolve which ModuleVersion to use following the resolution process defined in the remarks section of the Module class. To define which ModuleVersion is used in a submitted pipeline, define one of the following when creating a ModuleStep:

  • A ModuleVersion object.

  • A Module object and a version value.

  • A Module object without a version value. In this case, version resolution may vary across submissions.

You must define the mapping between the ModuleStep's inputs and outputs to the ModuleVersion's inputs and outputs.

The following example shows how to create a ModuleStep as a part of pipeline with multiple ModuleStep objects:


   middle_step = ModuleStep(module=module,
                            inputs_map= middle_step_input_wiring,
                            outputs_map= middle_step_output_wiring,
                            runconfig=RunConfiguration(), compute_target=aml_compute,
                            arguments = ["--file_num1", first_sum, "--file_num2", first_prod,
                                         "--output_sum", middle_sum, "--output_product", middle_prod])

Full sample is available from https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-how-to-use-modulestep.ipynb

Methods

create_node

Create a node from the ModuleStep step and add it to the specified graph.

This method is not intended to be used directly. When a pipeline is instantiated with this step, Azure ML automatically passes the parameters required through this method so that step can be added to a pipeline graph that represents the workflow.

create_node

Create a node from the ModuleStep step and add it to the specified graph.

This method is not intended to be used directly. When a pipeline is instantiated with this step, Azure ML automatically passes the parameters required through this method so that step can be added to a pipeline graph that represents the workflow.

create_node(graph, default_datastore, context)

Parameters

graph
Graph
Required

The graph object to add the node to.

default_datastore
Union[AbstractAzureStorageDatastore, AzureDataLakeDatastore]
Required

The default datastore.

context
<xref:azureml.pipeline.core._GraphContext>
Required

The graph context.

Returns

The node object.

Return type