Module class

Definition

Represents a computation unit used in a Azure Machine Learning pipeline.

A module is a collection of files which will run on a compute target and a description of an interface. The collection of files can be script, binaries, or any other files required to execute on the compute target. The module interface describes inputs, outputs, and parameter definitions. It doesn't bind them to specific values or data. A module has a snapshot associated with it, which captures the collection of files defined for the module.

Module(workspace, module_id, name, description, status, default_version, module_version_list, _module_provider=None, _module_version_provider=None)
Inheritance
builtins.object
Module

Parameters

workspace
Workspace

The workspace object this Module belongs to.

module_id
str

The ID of the Module.

name
str

The name of the Module.

description
str

The description of the Module.

status
str

The new status of the Module: 'Active', 'Deprecated', or 'Disabled'.

default_version
str

The default version of the Module.

module_version_list
list

A list of ModuleVersionDescriptor objects.

_module_provider
_AzureMLModuleProvider object

The Module provider.

_module_version_provider
_AevaMlModuleVersionProvider object

The ModuleVersion provider.

Remarks

A Module acts as a container of its versions. In the following example, a ModuleVersion is created from the publish_python_script(script_name, description, inputs, outputs, params=None, create_sequencing_ports=True, version=None, is_default=False, source_directory=None, hash_paths=None, category=None) method and has two inputs and two outputs. The create ModuleVersion is the default version (is_default is set to True).


   out_sum = OutputPortDef(name="out_sum", default_datastore_name=datastore.name, default_datastore_mode="mount",
                           label="Sum of two numbers")
   out_prod = OutputPortDef(name="out_prod", default_datastore_name=datastore.name, default_datastore_mode="mount",
                            label="Product of two numbers")
   entry_version = module.publish_python_script("calculate.py", "initial",
                                                inputs=[], outputs=[out_sum, out_prod], params = {"initialNum":12},
                                                version="1", source_directory="./calc")

Full sample is available from https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-how-to-use-modulestep.ipynb

This module can be used when defining a pipeline, in different steps, by using a ModuleStep.

The following sample shows how to wire the data used in the pipeline to inputs and outputs of a ModuleVersion using PipelineData:


   middle_step_input_wiring = {"in1":first_sum, "in2":first_prod}
   middle_sum = PipelineData("middle_sum", datastore=datastore, output_mode="mount",is_directory=False)
   middle_prod = PipelineData("middle_prod", datastore=datastore, output_mode="mount",is_directory=False)
   middle_step_output_wiring = {"out_sum":middle_sum, "out_prod":middle_prod}

Full sample is available from https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-how-to-use-modulestep.ipynb

The maping can then be used when creating the ModuleStep:


   middle_step = ModuleStep(module=module,
                            inputs_map= middle_step_input_wiring,
                            outputs_map= middle_step_output_wiring,
                            runconfig=RunConfiguration(), compute_target=aml_compute,
                            arguments = ["--file_num1", first_sum, "--file_num2", first_prod,
                                         "--output_sum", middle_sum, "--output_product", middle_prod])

Full sample is available from https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-how-to-use-modulestep.ipynb

The resolution of which version of the module to use happens upon submission, and follows the following process:

  • Remove all disabled versions
  • If a specific version was stated, use that, else
  • If a default version was defined to the Module, use that, else
  • If all versions follow semantic versioning without letters, take the highest value, else
  • Take the version of the Module that was updated last

Note that because a node's inputs and outputs mapping to a module's input and output is defined upon Pipeline creation, if the resolved version upon submission has a different interface from the one that is resolved upon pipeline creation, then the pipeline submission will fail.

The underlying module can be updated with new versions while keeping the default version the same.

Modules are uniquely named within a workspace.

Methods

create(workspace, name, description, _workflow_provider=None)

Create the Module.

deprecate()

Set the Module to 'Deprecated'.

disable()

Set the Module to 'Disabled'.

enable()

Set the Module to 'Active'.

get(workspace, module_id=None, name=None, _workflow_provider=None)

Get the Module by name or by ID; throws an exception if either is not provided.

get_default()

Get the default module version.

get_default_version()

Get the default version of module.

get_versions(workspace, name, _workflow_provider=None)

Get all the versions of the Module.

module_def_builder(name, description, execution_type, input_bindings, output_bindings, param_defs=None, create_sequencing_ports=True, allow_reuse=True, version=None, module_type=None)

Create the module definition object that describes the step.

module_version_list()

Get the Module version list.

process_source_directory_and_hash_paths(name, source_directory, script_name, hash_paths)

Process source directory and hash paths for the step.

publish(description, execution_type, inputs, outputs, param_defs=None, create_sequencing_ports=True, version=None, is_default=False, content_path=None, hash_paths=None, category=None)

Create a ModuleVersion and add it to the current Module.

publish_adla_script(script_name, description, inputs, outputs, params=None, create_sequencing_ports=True, degree_of_parallelism=None, priority=None, runtime_version=None, compute_target=None, version=None, is_default=False, source_directory=None, hash_paths=None, category=None)

Create a ModuleVersion that's based on Azure Data Lake Analytics (ADLA) and add it to the current Module.

publish_azure_batch(description, compute_target, inputs, outputs, params=None, create_sequencing_ports=True, version=None, is_default=False, create_pool=False, pool_id=None, delete_batch_job_after_finish=False, delete_batch_pool_after_finish=False, is_positive_exit_code_failure=True, vm_image_urn='urn:MicrosoftWindowsServer:WindowsServer:2012-R2-Datacenter', run_task_as_admin=False, target_compute_nodes=1, vm_size='standard_d1_v2', executable=None, source_directory=None, category=None)

Create a ModuleVersion that's uses Azure batch and add it to the current Module.

publish_python_script(script_name, description, inputs, outputs, params=None, create_sequencing_ports=True, version=None, is_default=False, source_directory=None, hash_paths=None, category=None)

Create a ModuleVersion that's based on a Python script and add it to the current Module.

resolve(version=None)

Resolve and return the right ModuleVersion.

set_default_version(version_id)

Set the default module version to set as default.

set_description(description)

Set the description of Module.

set_name(name)

Set the name of Module.

create(workspace, name, description, _workflow_provider=None)

Create the Module.

create(workspace, name, description, _workflow_provider=None)

Parameters

workspace
Workspace

The workspace in which to create the Module.

name
str

The name of the Module.

description
str

The description of the Module.

_workflow_provider
_AevaWorkflowProvider object

The workflow provider.

default value: None

Returns

Module object

Return type

deprecate()

Set the Module to 'Deprecated'.

deprecate()

disable()

Set the Module to 'Disabled'.

disable()

enable()

Set the Module to 'Active'.

enable()

get(workspace, module_id=None, name=None, _workflow_provider=None)

Get the Module by name or by ID; throws an exception if either is not provided.

get(workspace, module_id=None, name=None, _workflow_provider=None)

Parameters

workspace
Workspace

The workspace in which to create the Module.

module_id
str

The ID of the Module.

default value: None
name
str

The name of the Module.

default value: None
_workflow_provider
_AevaWorkflowProvider object

The workflow provider.

default value: None

Returns

Module object

Return type

get_default()

Get the default module version.

get_default()

Returns

The default module version.

Return type

get_default_version()

Get the default version of module.

get_default_version()

Returns

The default version of the module.

Return type

str

get_versions(workspace, name, _workflow_provider=None)

Get all the versions of the Module.

get_versions(workspace, name, _workflow_provider=None)

Parameters

workspace
Workspace

The workspace the Module was created on.

name
str

The name of the Module.

_workflow_provider
_AevaWorkflowProvider object

The workflow provider.

default value: None

Returns

The list of ModuleVersionDescriptor

Return type

module_def_builder(name, description, execution_type, input_bindings, output_bindings, param_defs=None, create_sequencing_ports=True, allow_reuse=True, version=None, module_type=None)

Create the module definition object that describes the step.

module_def_builder(name, description, execution_type, input_bindings, output_bindings, param_defs=None, create_sequencing_ports=True, allow_reuse=True, version=None, module_type=None)

Parameters

name
str

The name the Module.

description
str

The description of the Module.

execution_type
str

The execution type of the Module.

input_bindings
list

The Module input bindings.

output_bindings
list

The Module output bindings.

param_defs
list

The Module param definitions.

default value: None
create_sequencing_ports
bool

Indicates whether sequencing ports will be created for the Module.

default value: True
allow_reuse
bool

Indicates whether he Module will be available to be reused.

default value: True
version
str

The version of the Module.

default value: None
module_type
str

The mMdule type.

default value: None

Returns

The Module def object.

Return type

module_version_list()

Get the Module version list.

module_version_list()

Returns

The list of ModuleVersionDescriptor

Return type

process_source_directory_and_hash_paths(name, source_directory, script_name, hash_paths)

Process source directory and hash paths for the step.

process_source_directory_and_hash_paths(name, source_directory, script_name, hash_paths)

Parameters

name
str

The name of the step.

source_directory
str

The source directory for the step.

script_name
str

The script name for the step.

hash_paths
list

The hash paths to use when determining the Module fingerprint.

Returns

The source directory and hash paths.

Return type

publish(description, execution_type, inputs, outputs, param_defs=None, create_sequencing_ports=True, version=None, is_default=False, content_path=None, hash_paths=None, category=None)

Create a ModuleVersion and add it to the current Module.

publish(description, execution_type, inputs, outputs, param_defs=None, create_sequencing_ports=True, version=None, is_default=False, content_path=None, hash_paths=None, category=None)

Parameters

description
str

The description of the Module.

execution_type
str

The execution type of the Module.

inputs
list

The Module inputs.

outputs
list

The Module outputs.

param_defs
list

The Module parameter definitions.

default value: None
create_sequencing_ports
bool

Indicates whether sequencing ports will be created for the Module.

default value: True
version
str

The version of the Module.

default value: None
is_default
bool

Indicates whether the published version is to be the default one.

default value: False
content_path
str

directory

default value: None
hash_paths
list

A list of paths to hash when checking for changes to the step contents. If there are no changes detected, the pipeline will reuse the step contents from a previous run. By default, the contents of the source_directory are hashed (except files listed in .amlignore or .gitignore). DEPRECATED: no longer needed.

default value: None
category
str

The module version's category

default value: None

Return type

publish_adla_script(script_name, description, inputs, outputs, params=None, create_sequencing_ports=True, degree_of_parallelism=None, priority=None, runtime_version=None, compute_target=None, version=None, is_default=False, source_directory=None, hash_paths=None, category=None)

Create a ModuleVersion that's based on Azure Data Lake Analytics (ADLA) and add it to the current Module.

publish_adla_script(script_name, description, inputs, outputs, params=None, create_sequencing_ports=True, degree_of_parallelism=None, priority=None, runtime_version=None, compute_target=None, version=None, is_default=False, source_directory=None, hash_paths=None, category=None)

Parameters

script_name
str

The name of an ADLA script, relative to source_directory.

description
str

The description of the Module version.

inputs
list

The Module input bindings.

outputs
list

The Module output bindings.

params
dict

The ModuleVersion params, as name-default_value pairs.

default value: None
create_sequencing_ports
bool

Indicates whether sequencing ports will be created for the Module.

default value: True
degree_of_parallelism
int

The degree of parallelism to use for this job.

default value: None
priority
int

The priority value to use for the current job.

default value: None
runtime_version
str

The runtime version of the Azure Data Lake Analytics (ADLA) engine.

default value: None
compute_target
AdlaCompute or str

The ADLA compute to use for this job.

default value: None
version
str

The version of the module.

default value: None
is_default
bool

Indicates whether the published version is to be the default one.

default value: False
source_directory
str

directory

default value: None
hash_paths
list

hash_paths

default value: None
category
str

The module version's category

default value: None

Return type

publish_azure_batch(description, compute_target, inputs, outputs, params=None, create_sequencing_ports=True, version=None, is_default=False, create_pool=False, pool_id=None, delete_batch_job_after_finish=False, delete_batch_pool_after_finish=False, is_positive_exit_code_failure=True, vm_image_urn='urn:MicrosoftWindowsServer:WindowsServer:2012-R2-Datacenter', run_task_as_admin=False, target_compute_nodes=1, vm_size='standard_d1_v2', executable=None, source_directory=None, category=None)

Create a ModuleVersion that's uses Azure batch and add it to the current Module.

publish_azure_batch(description, compute_target, inputs, outputs, params=None, create_sequencing_ports=True, version=None, is_default=False, create_pool=False, pool_id=None, delete_batch_job_after_finish=False, delete_batch_pool_after_finish=False, is_positive_exit_code_failure=True, vm_image_urn='urn:MicrosoftWindowsServer:WindowsServer:2012-R2-Datacenter', run_task_as_admin=False, target_compute_nodes=1, vm_size='standard_d1_v2', executable=None, source_directory=None, category=None)

Parameters

description
str

The description of the Module version.

compute_target
BatchCompute or str

The BatchCompute compute target.

inputs
list

The Module input bindings.

outputs
list

The Module output bindings.

params
dict

The ModuleVersion params, as name-default_value pairs.

default value: None
create_sequencing_ports
bool

Indicates whether sequencing ports will be created for the Module.

default value: True
version
str

The version of the Module.

default value: None
is_default
bool

Indicates whether the published version is to be the default one.

default value: False
create_pool
bool

Indicates whether to create the pool before running the jobs.

default value: False
pool_id
str

(Mandatory) The ID of the Pool where the job will run.

default value: None
delete_batch_job_after_finish
bool

Indicates whether to delete the job from Batch account after it's finished.

default value: False
delete_batch_pool_after_finish
bool

Indicates whether to delete the pool after the job finishes.

default value: False
is_positive_exit_code_failure
bool

Indicates whether he job fails if the task exists with a positive code.

default value: True
vm_image_urn
str

If create_pool is True and VM uses VirtualMachineConfiguration, then this parameter indicates the VM image to use. Value format: urn:publisher:offer:sku. Example: urn:MicrosoftWindowsServer:WindowsServer:2012-R2-Datacenter.

default value: urn:MicrosoftWindowsServer:WindowsServer:2012-R2-Datacenter
run_task_as_admin
bool

Indicates whether the task should run with Admin privileges.

default value: False
target_compute_nodes
int

If create_pool is True, indicates how many compute nodes will be added to the pool.

default value: 1
vm_size
str

If create_pool is True, indicates the virtual machine size of the compute nodes.

default value: standard_d1_v2
executable
str

The name of the command/executable that will be executed as part of the job.

default value: None
source_directory
str

The source directory.

default value: None
category
str

The module version's category

default value: None

Return type

publish_python_script(script_name, description, inputs, outputs, params=None, create_sequencing_ports=True, version=None, is_default=False, source_directory=None, hash_paths=None, category=None)

Create a ModuleVersion that's based on a Python script and add it to the current Module.

publish_python_script(script_name, description, inputs, outputs, params=None, create_sequencing_ports=True, version=None, is_default=False, source_directory=None, hash_paths=None, category=None)

Parameters

script_name
str

The name of a Python script, relative to source_directory.

description
str

The description of the Module version.

inputs
list

The Module input bindings.

outputs
list

The Module output bindings.

params
dict

The ModuleVersion params, as name-default_value pairs.

default value: None
create_sequencing_ports
bool

Indicates whether sequencing ports will be created for the Module.

default value: True
version
str

The version of the Module.

default value: None
is_default
bool

Indicates whether the published version is to be the default one.

default value: False
source_directory
str

directory

default value: None
hash_paths
list

A list of paths to hash when checking for changes to the step contents. If there are no changes detected, the pipeline will reuse the step contents from a previous run. By default the contents of the source_directory are hashed (except files listed in .amlignore or .gitignore). DEPRECATED: no longer needed.

default value: None
category
str

The module version's category

default value: None

Return type

resolve(version=None)

Resolve and return the right ModuleVersion.

resolve(version=None)

Parameters

version
default value: None

Returns

The Module version to use.

Return type

set_default_version(version_id)

Set the default module version to set as default.

set_default_version(version_id)

Parameters

version_id

Returns

The default version.

Return type

str

set_description(description)

Set the description of Module.

set_description(description)

Parameters

description
str

The description to set.

set_name(name)

Set the name of Module.

set_name(name)

Parameters

name
str

The name to set.

Attributes

default_version

Get the default version of the Module.

Returns

The default version string.

Return type

str

description

Get the description of the Module.

Returns

The description string.

Return type

str

id

Get the ID of the Module.

Returns

The id.

Return type

str

name

Get the name of the Module.

Returns

The name.

Return type

str

status

Get the status of the Module.

Returns

The status.

Return type

str