Module Class

Represents a computation unit used in an Azure Machine Learning pipeline.

A module is a collection of files which will run on a compute target and a description of an interface. The collection of files can be script, binaries, or any other files required to execute on the compute target. The module interface describes inputs, outputs, and parameter definitions. It doesn't bind them to specific values or data. A module has a snapshot associated with it, which captures the collection of files defined for the module.

Inheritance
builtins.object
Module

Constructor

Module(workspace, module_id, name, description, status, default_version, module_version_list, _module_provider=None, _module_version_provider=None)

Parameters

workspace
Workspace

The workspace object this Module belongs to.

module_id
str

The ID of the Module.

name
str

The name of the Module.

description
str

The description of the Module.

status
str

The new status of the Module: 'Active', 'Deprecated', or 'Disabled'.

default_version
str

The default version of the Module.

module_version_list
list

A list of ModuleVersionDescriptor objects.

_module_provider
<xref:azureml.pipeline.core._aeva_provider._AzureMLModuleProvider>

(Internal use only.) The Module provider.

_module_version_provider
<xref:azureml.pipeline.core._aeva_provider._AevaMlModuleVersionProvider>

(Internal use only.) The ModuleVersion provider.

Remarks

A Module acts as a container of its versions. In the following example, a ModuleVersion is created from the publish_python_script method and has two inputs and two outputs. The create ModuleVersion is the default version (is_default is set to True).


   out_sum = OutputPortDef(name="out_sum", default_datastore_name=datastore.name, default_datastore_mode="mount",
                           label="Sum of two numbers")
   out_prod = OutputPortDef(name="out_prod", default_datastore_name=datastore.name, default_datastore_mode="mount",
                            label="Product of two numbers")
   entry_version = module.publish_python_script("calculate.py", "initial",
                                                inputs=[], outputs=[out_sum, out_prod], params = {"initialNum":12},
                                                version="1", source_directory="./calc")

Full sample is available from https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-how-to-use-modulestep.ipynb

This module can be used when defining a pipeline, in different steps, by using a ModuleStep.

The following sample shows how to wire the data used in the pipeline to inputs and outputs of a ModuleVersion using PipelineData:


   middle_step_input_wiring = {"in1":first_sum, "in2":first_prod}
   middle_sum = PipelineData("middle_sum", datastore=datastore, output_mode="mount",is_directory=False)
   middle_prod = PipelineData("middle_prod", datastore=datastore, output_mode="mount",is_directory=False)
   middle_step_output_wiring = {"out_sum":middle_sum, "out_prod":middle_prod}

Full sample is available from https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-how-to-use-modulestep.ipynb

The mapping can then be used when creating the ModuleStep:


   middle_step = ModuleStep(module=module,
                            inputs_map= middle_step_input_wiring,
                            outputs_map= middle_step_output_wiring,
                            runconfig=RunConfiguration(), compute_target=aml_compute,
                            arguments = ["--file_num1", first_sum, "--file_num2", first_prod,
                                         "--output_sum", middle_sum, "--output_product", middle_prod])

Full sample is available from https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-how-to-use-modulestep.ipynb

The resolution of which version of the module to use happens upon submission, and follows the following process:

  • Remove all disabled versions
  • If a specific version was stated, use that, else
  • If a default version was defined to the Module, use that, else
  • If all versions follow semantic versioning without letters, take the highest value, else
  • Take the version of the Module that was updated last

Note that because a node's inputs and outputs mapping to a module's input and output is defined upon Pipeline creation, if the resolved version upon submission has a different interface from the one that is resolved upon pipeline creation, then the pipeline submission will fail.

The underlying module can be updated with new versions while keeping the default version the same.

Modules are uniquely named within a workspace.

Methods

create

Create the Module.

deprecate

Set the Module to 'Deprecated'.

disable

Set the Module to 'Disabled'.

enable

Set the Module to 'Active'.

get

Get the Module by name or by ID; throws an exception if either is not provided.

get_default

Get the default module version.

get_default_version

Get the default version of Module.

get_versions

Get all the versions of the Module.

module_def_builder

Create the module definition object that describes the step.

module_version_list

Get the Module version list.

process_source_directory

Process source directory for the step and check that the script exists.

publish

Create a ModuleVersion and add it to the current Module.

publish_adla_script

Create a ModuleVersion based on Azure Data Lake Analytics (ADLA) and add it to the current Module.

publish_azure_batch

Create a ModuleVersion that uses Azure batch and add it to the current Module.

publish_python_script

Create a ModuleVersion that's based on a Python script and add it to the current Module.

resolve

Resolve and return the right ModuleVersion.

set_default_version

Set the default ModuleVersion of the Module.

set_description

Set the description of Module.

set_name

Set the name of Module.

create

Create the Module.

create(workspace, name, description, _workflow_provider=None)

Parameters

workspace
Workspace

The workspace in which to create the Module.

name
str

The name of the Module.

description
str

The description of the Module.

_workflow_provider
<xref:azureml.pipeline.core._aeva_provider._AevaWorkflowProvider>
default value: None

(Internal use only.) The workflow provider.

Returns

Module object

Return type

deprecate

Set the Module to 'Deprecated'.

deprecate()

disable

Set the Module to 'Disabled'.

disable()

enable

Set the Module to 'Active'.

enable()

get

Get the Module by name or by ID; throws an exception if either is not provided.

get(workspace, module_id=None, name=None, _workflow_provider=None)

Parameters

workspace
Workspace

The workspace in which to create the Module.

module_id
str
default value: None

The ID of the Module.

name
str
default value: None

The name of the Module.

_workflow_provider
<xref:azureml.pipeline.core._aeva_provider._AevaWorkflowProvider>
default value: None

(Internal use only.) The workflow provider.

Returns

Module object

Return type

get_default

Get the default module version.

get_default()

Returns

The default module version.

Return type

get_default_version

Get the default version of Module.

get_default_version()

Returns

The default version of the Module.

Return type

str

get_versions

Get all the versions of the Module.

get_versions(workspace, name, _workflow_provider=None)

Parameters

workspace
Workspace

The workspace the Module was created on.

name
str

The name of the Module.

_workflow_provider
<xref:azureml.pipeline.core._aeva_provider._AevaWorkflowProvider>
default value: None

(Internal use only.) The workflow provider.

Returns

The list of ModuleVersionDescriptor

Return type

module_def_builder

Create the module definition object that describes the step.

module_def_builder(name, description, execution_type, input_bindings, output_bindings, param_defs=None, create_sequencing_ports=True, allow_reuse=True, version=None, module_type=None, step_type=None, arguments=None, runconfig=None, cloud_settings=None)

Parameters

name
str

The name the Module.

description
str

The description of the Module.

execution_type
str

The execution type of the Module.

input_bindings
list

The Module input bindings.

output_bindings
list

The Module output bindings.

param_defs
list
default value: None

The Module param definitions.

create_sequencing_ports
bool
default value: True

Indicates whether sequencing ports will be created for the Module.

allow_reuse
bool
default value: True

Indicates whether he Module will be available to be reused.

version
str
default value: None

The version of the Module.

module_type
str
default value: None

The Module type.

step_type
str
default value: None

Type of step associated with this module, e.g. "PythonScriptStep", "HyperDriveStep", etc.

arguments
list
default value: None

Annotated arguments list to use when calling this module

runconfig
str
default value: None

Runconfig that will be used for python_script_step

cloud_settings
str
default value: None

Settings that will be used for clouds

Returns

The Module def object.

Return type

module_version_list

Get the Module version list.

module_version_list()

Returns

The list of ModuleVersionDescriptor

Return type

process_source_directory

Process source directory for the step and check that the script exists.

process_source_directory(name, source_directory, script_name)

Parameters

name
str

The name of the step.

source_directory
str

The source directory for the step.

script_name
str

The script name for the step.

Returns

The source directory and hash paths.

Return type

publish

Create a ModuleVersion and add it to the current Module.

publish(description, execution_type, inputs, outputs, param_defs=None, create_sequencing_ports=True, version=None, is_default=False, content_path=None, hash_paths=None, category=None, arguments=None, runconfig=None)

Parameters

description
str

The description of the Module.

execution_type
str

The execution type of the Module. Acceptable values are esCloud, adlcloud and AzureBatchCloud

inputs
list

The Module inputs.

outputs
list

The Module outputs.

param_defs
list
default value: None

The Module parameter definitions.

create_sequencing_ports
bool
default value: True

Indicates whether sequencing ports will be created for the Module.

version
str
default value: None

The version of the Module.

is_default
bool
default value: False

Indicates whether the published version is to be the default one.

content_path
str
default value: None

directory

hash_paths
list
default value: None

A list of paths to hash when checking for changes to the step contents. If there are no changes detected, the pipeline will reuse the step contents from a previous run. By default, the contents of the source_directory are hashed (except files listed in .amlignore or .gitignore). DEPRECATED: no longer needed.

category
str
default value: None

The module version's category

arguments
list
default value: None

Arguments to use when calling the module. Arguments can be strings, input references (InputPortDef), output references (OutputPortDef), and pipeline parameters (PipelineParameter).

runconfig
RunConfiguration
default value: None

An optional RunConfiguration. A RunConfiguration can be used to specify additional requirements for the run, such as conda dependencies and a Docker image.

Return type

publish_adla_script

Create a ModuleVersion based on Azure Data Lake Analytics (ADLA) and add it to the current Module.

publish_adla_script(script_name, description, inputs, outputs, params=None, create_sequencing_ports=True, degree_of_parallelism=None, priority=None, runtime_version=None, compute_target=None, version=None, is_default=False, source_directory=None, hash_paths=None, category=None, arguments=None)

Parameters

script_name
str

The name of an ADLA script, relative to source_directory.

description
str

The description of the Module version.

inputs
list

The Module input bindings.

outputs
list

The Module output bindings.

params
dict
default value: None

The ModuleVersion params, as name-default_value pairs.

create_sequencing_ports
bool
default value: True

Indicates whether sequencing ports will be created for the Module.

degree_of_parallelism
int
default value: None

The degree of parallelism to use for this job.

priority
int
default value: None

The priority value to use for the current job.

runtime_version
str
default value: None

The runtime version of the Azure Data Lake Analytics (ADLA) engine.

compute_target
AdlaCompute, str
default value: None

The ADLA compute to use for this job.

version
str
default value: None

The version of the module.

is_default
bool
default value: False

Indicates whether the published version is to be the default one.

source_directory
str
default value: None

directory

hash_paths
list
default value: None

hash_paths

category
str
default value: None

The module version's category

arguments
list
default value: None

Arguments to use when calling the module. Arguments can be strings, input references (InputPortDef), output references (OutputPortDef), and pipeline parameters (PipelineParameter).

Return type

publish_azure_batch

Create a ModuleVersion that uses Azure batch and add it to the current Module.

publish_azure_batch(description, compute_target, inputs, outputs, params=None, create_sequencing_ports=True, version=None, is_default=False, create_pool=False, pool_id=None, delete_batch_job_after_finish=False, delete_batch_pool_after_finish=False, is_positive_exit_code_failure=True, vm_image_urn='urn:MicrosoftWindowsServer:WindowsServer:2012-R2-Datacenter', run_task_as_admin=False, target_compute_nodes=1, vm_size='standard_d1_v2', executable=None, source_directory=None, category=None, arguments=None)

Parameters

description
str

The description of the Module version.

compute_target
BatchCompute or str

The BatchCompute compute target.

inputs
list

The Module input bindings.

outputs
list

The Module output bindings.

params
dict
default value: None

The ModuleVersion params, as name-default_value pairs.

create_sequencing_ports
bool
default value: True

Indicates whether sequencing ports will be created for the Module.

version
str
default value: None

The version of the Module.

is_default
bool
default value: False

Indicates whether the published version is to be the default one.

create_pool
bool
default value: False

Indicates whether to create the pool before running the jobs.

pool_id
str
default value: None

(Mandatory) The ID of the Pool where the job will run.

delete_batch_job_after_finish
bool
default value: False

Indicates whether to delete the job from Batch account after it's finished.

delete_batch_pool_after_finish
bool
default value: False

Indicates whether to delete the pool after the job finishes.

is_positive_exit_code_failure
bool
default value: True

Indicates whether he job fails if the task exists with a positive code.

vm_image_urn
str
default value: urn:MicrosoftWindowsServer:WindowsServer:2012-R2-Datacenter

If create_pool is True and VM uses VirtualMachineConfiguration, then this parameter indicates the VM image to use. Value format: urn:publisher:offer:sku. Example: urn:MicrosoftWindowsServer:WindowsServer:2012-R2-Datacenter.

run_task_as_admin
bool
default value: False

Indicates whether the task should run with Admin privileges.

target_compute_nodes
int
default value: 1

If create_pool is True, indicates how many compute nodes will be added to the pool.

vm_size
str
default value: standard_d1_v2

If create_pool is True, indicates the virtual machine size of the compute nodes.

executable
str
default value: None

The name of the command/executable that will be executed as part of the job.

source_directory
str
default value: None

The source directory.

category
str
default value: None

The module version's category

arguments
list
default value: None

Arguments to use when calling the module. Arguments can be strings, input references (InputPortDef), output references (OutputPortDef), and pipeline parameters (PipelineParameter).

Return type

publish_python_script

Create a ModuleVersion that's based on a Python script and add it to the current Module.

publish_python_script(script_name, description, inputs, outputs, params=None, create_sequencing_ports=True, version=None, is_default=False, source_directory=None, hash_paths=None, category=None, arguments=None, runconfig=None)

Parameters

script_name
str

The name of a Python script, relative to source_directory.

description
str

The description of the Module version.

inputs
list

The Module input bindings.

outputs
list

The Module output bindings.

params
dict
default value: None

The ModuleVersion params, as name-default_value pairs.

create_sequencing_ports
bool
default value: True

Indicates whether sequencing ports will be created for the Module.

version
str
default value: None

The version of the Module.

is_default
bool
default value: False

Indicates whether the published version is to be the default one.

source_directory
str
default value: None

directory

hash_paths
list
default value: None

A list of paths to hash when checking for changes to the step contents. If there are no changes detected, the pipeline will reuse the step contents from a previous run. By default the contents of the source_directory are hashed (except files listed in .amlignore or .gitignore). DEPRECATED: no longer needed.

category
str
default value: None

The module version's category

arguments
list
default value: None

Arguments to use when calling the module. Arguments can be strings, input references (InputPortDef), output references (OutputPortDef), and pipeline parameters (PipelineParameter).

runconfig
RunConfiguration
default value: None

An optional RunConfiguration. A RunConfiguration can be used to specify additional requirements for the run, such as conda dependencies and a Docker image.

Return type

resolve

Resolve and return the right ModuleVersion.

resolve(version=None)

Parameters

version
default value: None

Returns

The Module version to use.

Return type

set_default_version

Set the default ModuleVersion of the Module.

set_default_version(version_id)

Parameters

version_id

Returns

The default version.

Return type

str

set_description

Set the description of Module.

set_description(description)

Parameters

description
str

The description to set.

set_name

Set the name of Module.

set_name(name)

Parameters

name
str

The name to set.

Attributes

default_version

Get the default version of the Module.

Returns

The default version string.

Return type

str

description

Get the description of the Module.

Returns

The description string.

Return type

str

id

Get the ID of the Module.

Returns

The id.

Return type

str

name

Get the name of the Module.

Returns

The name.

Return type

str

status

Get the status of the Module.

Returns

The status.

Return type

str