Module Class

Reference

Represents a computation unit used in an Azure Machine Learning pipeline.

A module is a collection of files which will run on a compute target and a description of an interface. The collection of files can be script, binaries, or any other files required to execute on the compute target. The module interface describes inputs, outputs, and parameter definitions. It doesn't bind them to specific values or data. A module has a snapshot associated with it, which captures the collection of files defined for the module.

Initialize Module.

Inheritance: builtins.object

Module

Constructor

Module(workspace, module_id, name, description, status, default_version, module_version_list, _module_provider=None, _module_version_provider=None)

Parameters

workspace: Workspace

Required

The workspace object this Module belongs to.

module_id: str

Required

The ID of the Module.

name: str

Required

The name of the Module.

description: str

Required

The description of the Module.

status: str

Required

The new status of the Module: 'Active', 'Deprecated', or 'Disabled'.

default_version: str

Required

The default version of the Module.

module_version_list: list

Required

A list of ModuleVersionDescriptor objects.

_module_provider: <xref:azureml.pipeline.core._aeva_provider._AzureMLModuleProvider>

default value: None

(Internal use only.) The Module provider.

_module_version_provider: <xref:azureml.pipeline.core._aeva_provider._AevaMlModuleVersionProvider>

default value: None

(Internal use only.) The ModuleVersion provider.

workspace: Workspace

Required

The workspace object this Module belongs to.

module_id: str

Required

The ID of the Module.

name: str

Required

The name of the Module.

description: str

Required

The description of the Module.

status: str

Required

The new status of the Module: 'Active', 'Deprecated', or 'Disabled'.

default_version: str

Required

The default version of the Module.

module_version_list: list

Required

A list of ModuleVersionDescriptor objects.

_module_provider: <xref:<xref:_AevaMlModuleProvider object>>

Required

The Module provider.

_module_version_provider: <xref:azureml.pipeline.core._aeva_provider._AevaMlModuleVersionProvider>

Required

The ModuleVersion provider.

Remarks

A Module acts as a container of its versions. In the following example, a ModuleVersion is created from the publish_python_script method and has two inputs and two outputs. The create ModuleVersion is the default version (is_default is set to True).


   out_sum = OutputPortDef(name="out_sum", default_datastore_name=datastore.name, default_datastore_mode="mount",
                           label="Sum of two numbers")
   out_prod = OutputPortDef(name="out_prod", default_datastore_name=datastore.name, default_datastore_mode="mount",
                            label="Product of two numbers")
   entry_version = module.publish_python_script("calculate.py", "initial",
                                                inputs=[], outputs=[out_sum, out_prod], params = {"initialNum":12},
                                                version="1", source_directory="./calc")

Full sample is available from https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-how-to-use-modulestep.ipynb

This module can be used when defining a pipeline, in different steps, by using a ModuleStep.

The following sample shows how to wire the data used in the pipeline to inputs and outputs of a ModuleVersion using PipelineData:


   middle_step_input_wiring = {"in1":first_sum, "in2":first_prod}
   middle_sum = PipelineData("middle_sum", datastore=datastore, output_mode="mount",is_directory=False)
   middle_prod = PipelineData("middle_prod", datastore=datastore, output_mode="mount",is_directory=False)
   middle_step_output_wiring = {"out_sum":middle_sum, "out_prod":middle_prod}

Full sample is available from https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-how-to-use-modulestep.ipynb

The mapping can then be used when creating the ModuleStep:


   middle_step = ModuleStep(module=module,
                            inputs_map= middle_step_input_wiring,
                            outputs_map= middle_step_output_wiring,
                            runconfig=RunConfiguration(), compute_target=aml_compute,
                            arguments = ["--file_num1", first_sum, "--file_num2", first_prod,
                                         "--output_sum", middle_sum, "--output_product", middle_prod])

Full sample is available from https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-how-to-use-modulestep.ipynb

The resolution of which version of the module to use happens upon submission, and follows the following process:

Remove all disabled versions
If a specific version was stated, use that, else
If a default version was defined to the Module, use that, else
If all versions follow semantic versioning without letters, take the highest value, else
Take the version of the Module that was updated last

Note that because a node's inputs and outputs mapping to a module's input and output is defined upon Pipeline creation, if the resolved version upon submission has a different interface from the one that is resolved upon pipeline creation, then the pipeline submission will fail.

The underlying module can be updated with new versions while keeping the default version the same.

Modules are uniquely named within a workspace.

Methods

create	Create the Module.
deprecate	Set the Module to 'Deprecated'.
disable	Set the Module to 'Disabled'.
enable	Set the Module to 'Active'.
get	Get the Module by name or by ID; throws an exception if either is not provided.
get_default	Get the default module version.
get_default_version	Get the default version of Module.
get_versions	Get all the versions of the Module.
module_def_builder	Create the module definition object that describes the step.
module_version_list	Get the Module version list.
process_source_directory	Process source directory for the step and check that the script exists.
publish	Create a ModuleVersion and add it to the current Module.
publish_adla_script	Create a ModuleVersion based on Azure Data Lake Analytics (ADLA) and add it to the current Module.
publish_azure_batch	Create a ModuleVersion that uses Azure batch and add it to the current Module.
publish_python_script	Create a ModuleVersion that's based on a Python script and add it to the current Module.
resolve	Resolve and return the right ModuleVersion.
set_default_version	Set the default ModuleVersion of the Module.
set_description	Set the description of Module.
set_name	Set the name of Module.

create

Create the Module.

static create(workspace, name, description, _workflow_provider=None)

Parameters

workspace: Workspace

Required

The workspace in which to create the Module.

name: str

Required

The name of the Module.

description: str

Required

The description of the Module.

_workflow_provider: <xref:azureml.pipeline.core._aeva_provider._AevaWorkflowProvider>

default value: None

(Internal use only.) The workflow provider.

Returns

Module object

Return type

Module

deprecate

Set the Module to 'Deprecated'.

deprecate()

disable

Set the Module to 'Disabled'.

disable()

enable

Set the Module to 'Active'.

enable()

get

Get the Module by name or by ID; throws an exception if either is not provided.

static get(workspace, module_id=None, name=None, _workflow_provider=None)

Parameters

workspace: Workspace

Required

The workspace in which to create the Module.

module_id: str

default value: None

The ID of the Module.

name: str

default value: None

The name of the Module.

_workflow_provider: <xref:azureml.pipeline.core._aeva_provider._AevaWorkflowProvider>

Get all the versions of the Module.

static get_versions(workspace, name, _workflow_provider=None)

Parameters

workspace: Workspace

Required

The workspace the Module was created on.

name: str

Required

The name of the Module.

_workflow_provider: <xref:azureml.pipeline.core._aeva_provider._AevaWorkflowProvider>

default value: None

(Internal use only.) The workflow provider.

Returns

The list of ModuleVersionDescriptor

Return type

list

module_def_builder

Create the module definition object that describes the step.

static module_def_builder(name, description, execution_type, input_bindings, output_bindings, param_defs=None, create_sequencing_ports=True, allow_reuse=True, version=None, module_type=None, step_type=None, arguments=None, runconfig=None, cloud_settings=None)

Parameters

name: str

Required

The name the Module.

description: str

Required

The description of the Module.

execution_type: str

Required

The execution type of the Module.

input_bindings: list

Required

The Module input bindings.

output_bindings: list

Required

The Module output bindings.

param_defs: list

default value: None

The Module param definitions.

create_sequencing_ports: bool

default value: True

Indicates whether sequencing ports will be created for the Module.

allow_reuse: bool

default value: True

Indicates whether he Module will be available to be reused.

version: str

default value: None

The version of the Module.

module_type: str

default value: None

The Module type.

step_type: str

default value: None

Type of step associated with this module, e.g. "PythonScriptStep", "HyperDriveStep", etc.

arguments: list

default value: None

Annotated arguments list to use when calling this module

runconfig: str

default value: None

Runconfig that will be used for python_script_step

cloud_settings: str

default value: None

Settings that will be used for clouds

Returns

The Module def object.

Return type

ModuleDef

Exceptions

ValueError

module_version_list

Get the Module version list.

module_version_list()

Returns

The list of ModuleVersionDescriptor

Return type

list

process_source_directory

Process source directory for the step and check that the script exists.

static process_source_directory(name, source_directory, script_name)

Parameters

name: str

Required

The name of the step.

source_directory: str

Required

The source directory for the step.

script_name: str

Required

The script name for the step.

Returns

The source directory and hash paths.

Return type

str, list

Exceptions

ValueError

publish

Create a ModuleVersion and add it to the current Module.

publish(description, execution_type, inputs, outputs, param_defs=None, create_sequencing_ports=True, version=None, is_default=False, content_path=None, hash_paths=None, category=None, arguments=None, runconfig=None)

Parameters

description: str

Required

The description of the Module.

execution_type: str

Required

The execution type of the Module. Acceptable values are esCloud, adlcloud and AzureBatchCloud

inputs: list

Required

The Module inputs.

outputs: list

Required

The Module outputs.

param_defs: list

default value: None

The Module parameter definitions.

create_sequencing_ports: bool

default value: True

Indicates whether sequencing ports will be created for the Module.

version: str

default value: None

The version of the Module.

is_default: bool

default value: False

Indicates whether the published version is to be the default one.

content_path: str

default value: None

Return type

ModuleVersion

Exceptions

Exception

publish_adla_script

Create a ModuleVersion based on Azure Data Lake Analytics (ADLA) and add it to the current Module.

publish_adla_script(script_name, description, inputs, outputs, params=None, create_sequencing_ports=True, degree_of_parallelism=None, priority=None, runtime_version=None, compute_target=None, version=None, is_default=False, source_directory=None, hash_paths=None, category=None, arguments=None)

Parameters

script_name: str

Required

The name of an ADLA script, relative to source_directory.

description: str

Required

The description of the Module version.

inputs: list

Required

The Module input bindings.

outputs: list

Required

The Module output bindings.

params: dict

default value: None

The ModuleVersion params, as name-default_value pairs.

create_sequencing_ports: bool

default value: True

Indicates whether sequencing ports will be created for the Module.

degree_of_parallelism: int

default value: None

The degree of parallelism to use for this job.

priority: int

default value: None

The priority value to use for the current job.

runtime_version: str

default value: None

The runtime version of the Azure Data Lake Analytics (ADLA) engine.

compute_target: AdlaCompute, str

default value: None

The ADLA compute to use for this job.

version: str

default value: None

The version of the module.

is_default: bool

default value: False

Indicates whether the published version is to be the default one.

source_directory: str

default value: None

Return type

ModuleVersion

publish_azure_batch

Create a ModuleVersion that uses Azure batch and add it to the current Module.

publish_azure_batch(description, compute_target, inputs, outputs, params=None, create_sequencing_ports=True, version=None, is_default=False, create_pool=False, pool_id=None, delete_batch_job_after_finish=False, delete_batch_pool_after_finish=False, is_positive_exit_code_failure=True, vm_image_urn='urn:MicrosoftWindowsServer:WindowsServer:2012-R2-Datacenter', run_task_as_admin=False, target_compute_nodes=1, vm_size='standard_d1_v2', executable=None, source_directory=None, category=None, arguments=None)

Parameters

description: str

Required

The description of the Module version.

compute_target: BatchCompute or str

Required

The BatchCompute compute target.

inputs: list

Required

The Module input bindings.

outputs: list

Required

The Module output bindings.

params: dict

default value: None

The ModuleVersion params, as name-default_value pairs.

create_sequencing_ports: bool

default value: True

Indicates whether sequencing ports will be created for the Module.

version: str

default value: None

The version of the Module.

is_default: bool

default value: False

Indicates whether the published version is to be the default one.

create_pool: bool

default value: False

Indicates whether to create the pool before running the jobs.

pool_id: str

default value: None

(Mandatory) The ID of the Pool where the job will run.

delete_batch_job_after_finish: bool

default value: False

Indicates whether to delete the job from Batch account after it's finished.

delete_batch_pool_after_finish: bool

default value: False

Indicates whether to delete the pool after the job finishes.

is_positive_exit_code_failure: bool

default value: True

Indicates whether he job fails if the task exists with a positive code.

vm_image_urn: str

default value: urn:MicrosoftWindowsServer:WindowsServer:2012-R2-Datacenter

If create_pool is True and VM uses VirtualMachineConfiguration, then this parameter indicates the VM image to use. Value format: urn:publisher:offer:sku. Example: urn:MicrosoftWindowsServer:WindowsServer:2012-R2-Datacenter.

run_task_as_admin: bool

default value: False

Indicates whether the task should run with Admin privileges.

target_compute_nodes: int

default value: 1

If create_pool is True, indicates how many compute nodes will be added to the pool.

vm_size: str

default value: standard_d1_v2

If create_pool is True, indicates the virtual machine size of the compute nodes.

executable: str

default value: None

The name of the command/executable that will be executed as part of the job.

source_directory: str

default value: None

The source directory.

category: str

default value: None

The module version's category

arguments: list

default value: None

Arguments to use when calling the module. Arguments can be strings, input references (InputPortDef), output references (OutputPortDef), and pipeline parameters (PipelineParameter).

Return type

ModuleVersion

Exceptions

ValueError

publish_python_script

Create a ModuleVersion that's based on a Python script and add it to the current Module.

publish_python_script(script_name, description, inputs, outputs, params=None, create_sequencing_ports=True, version=None, is_default=False, source_directory=None, hash_paths=None, category=None, arguments=None, runconfig=None)

Parameters

script_name: str

Required

The name of a Python script, relative to source_directory.

description: str

Required

The description of the Module version.

inputs: list

Required

The Module input bindings.

outputs: list

Required

The Module output bindings.

params: dict

default value: None

The ModuleVersion params, as name-default_value pairs.

create_sequencing_ports: bool

default value: True

Indicates whether sequencing ports will be created for the Module.

version: str

default value: None

The version of the Module.

is_default: bool

default value: False

Indicates whether the published version is to be the default one.

source_directory: str

default value: None

Return type

ModuleVersion

resolve

Resolve and return the right ModuleVersion.

resolve(version=None)

Parameters

version

default value: None

Returns

The Module version to use.

Return type

ModuleVersion

set_default_version

Set the default ModuleVersion of the Module.

set_default_version(version_id)

Parameters

version_id

Required

Returns

The default version.

Return type

str

Exceptions

Exception

set_description

Set the description of Module.

set_description(description)

Parameters

description: str

Required

The description to set.

Exceptions

Exception

set_name

Set the name of Module.

set_name(name)

Parameters

name: str

Required

The name to set.

Exceptions

Exception

str