Module class
Definition
Represents a computation unit used in an Azure Machine Learning pipeline.
A module is a collection of files which will run on a compute target and a description of an interface. The collection of files can be script, binaries, or any other files required to execute on the compute target. The module interface describes inputs, outputs, and parameter definitions. It doesn't bind them to specific values or data. A module has a snapshot associated with it, which captures the collection of files defined for the module.
Module(workspace, module_id, name, description, status, default_version, module_version_list, _module_provider=None, _module_version_provider=None)
- Inheritance
-
builtins.objectModule
Parameters
- workspace
- Workspace
The workspace object this Module belongs to.
- module_id
- str
The ID of the Module.
- name
- str
The name of the Module.
- description
- str
The description of the Module.
- status
- str
The new status of the Module: 'Active', 'Deprecated', or 'Disabled'.
- default_version
- str
The default version of the Module.
- module_version_list
- list
A list of ModuleVersionDescriptor objects.
- _module_provider
- _AzureMLModuleProvider object
(Internal use only.) The Module provider.
- _module_version_provider
- _AevaMlModuleVersionProvider object
(Internal use only.) The ModuleVersion provider.
Remarks
A Module acts as a container of its versions. In the following example, a ModuleVersion is created
from the publish_python_script(script_name, description, inputs, outputs, params=None, create_sequencing_ports=True, version=None, is_default=False, source_directory=None, hash_paths=None, category=None) method and has
two inputs and two outputs. The create ModuleVersion is the default version (is_default
is set to True).
out_sum = OutputPortDef(name="out_sum", default_datastore_name=datastore.name, default_datastore_mode="mount",
label="Sum of two numbers")
out_prod = OutputPortDef(name="out_prod", default_datastore_name=datastore.name, default_datastore_mode="mount",
label="Product of two numbers")
entry_version = module.publish_python_script("calculate.py", "initial",
inputs=[], outputs=[out_sum, out_prod], params = {"initialNum":12},
version="1", source_directory="./calc")
Full sample is available from https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-how-to-use-modulestep.ipynb
This module can be used when defining a pipeline, in different steps, by using a ModuleStep.
The following sample shows how to wire the data used in the pipeline to inputs and outputs of a ModuleVersion using PipelineData:
middle_step_input_wiring = {"in1":first_sum, "in2":first_prod}
middle_sum = PipelineData("middle_sum", datastore=datastore, output_mode="mount",is_directory=False)
middle_prod = PipelineData("middle_prod", datastore=datastore, output_mode="mount",is_directory=False)
middle_step_output_wiring = {"out_sum":middle_sum, "out_prod":middle_prod}
Full sample is available from https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-how-to-use-modulestep.ipynb
The mapping can then be used when creating the ModuleStep:
middle_step = ModuleStep(module=module,
inputs_map= middle_step_input_wiring,
outputs_map= middle_step_output_wiring,
runconfig=RunConfiguration(), compute_target=aml_compute,
arguments = ["--file_num1", first_sum, "--file_num2", first_prod,
"--output_sum", middle_sum, "--output_product", middle_prod])
Full sample is available from https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-how-to-use-modulestep.ipynb
The resolution of which version of the module to use happens upon submission, and follows the following process:
- Remove all disabled versions
- If a specific version was stated, use that, else
- If a default version was defined to the Module, use that, else
- If all versions follow semantic versioning without letters, take the highest value, else
- Take the version of the Module that was updated last
Note that because a node's inputs and outputs mapping to a module's input and output is defined upon Pipeline creation, if the resolved version upon submission has a different interface from the one that is resolved upon pipeline creation, then the pipeline submission will fail.
The underlying module can be updated with new versions while keeping the default version the same.
Modules are uniquely named within a workspace.
Methods
create(workspace, name, description, _workflow_provider=None)
Create the Module.
create(workspace, name, description, _workflow_provider=None)
Parameters
- workspace
- Workspace
The workspace in which to create the Module.
- name
- str
The name of the Module.
- description
- str
The description of the Module.
- _workflow_provider
- _AevaWorkflowProvider object
(Internal use only.) The workflow provider.
Returns
Module object
Return type
deprecate()
Set the Module to 'Deprecated'.
deprecate()
disable()
Set the Module to 'Disabled'.
disable()
enable()
Set the Module to 'Active'.
enable()
get(workspace, module_id=None, name=None, _workflow_provider=None)
Get the Module by name or by ID; throws an exception if either is not provided.
get(workspace, module_id=None, name=None, _workflow_provider=None)
Parameters
- workspace
- Workspace
The workspace in which to create the Module.
- _workflow_provider
- _AevaWorkflowProvider object
(Internal use only.) The workflow provider.
Returns
Module object
Return type
get_default()
Get the default module version.
get_default()
Returns
The default module version.
Return type
get_default_version()
Get the default version of Module.
get_default_version()
Returns
The default version of the Module.
Return type
get_versions(workspace, name, _workflow_provider=None)
Get all the versions of the Module.
get_versions(workspace, name, _workflow_provider=None)
Parameters
- workspace
- Workspace
The workspace the Module was created on.
- name
- str
The name of the Module.
- _workflow_provider
- _AevaWorkflowProvider object
(Internal use only.) The workflow provider.
Returns
The list of ModuleVersionDescriptor
Return type
module_def_builder(name, description, execution_type, input_bindings, output_bindings, param_defs=None, create_sequencing_ports=True, allow_reuse=True, version=None, module_type=None)
Create the module definition object that describes the step.
module_def_builder(name, description, execution_type, input_bindings, output_bindings, param_defs=None, create_sequencing_ports=True, allow_reuse=True, version=None, module_type=None)
Parameters
- name
- str
The name the Module.
- description
- str
The description of the Module.
- execution_type
- str
The execution type of the Module.
- input_bindings
- list
The Module input bindings.
- output_bindings
- list
The Module output bindings.
- create_sequencing_ports
- bool
Indicates whether sequencing ports will be created for the Module.
Returns
The Module def object.
Return type
module_version_list()
Get the Module version list.
module_version_list()
Returns
The list of ModuleVersionDescriptor
Return type
process_source_directory_and_hash_paths(name, source_directory, script_name, hash_paths)
Process source directory and hash paths for the step.
process_source_directory_and_hash_paths(name, source_directory, script_name, hash_paths)
Parameters
- name
- str
The name of the step.
- source_directory
- str
The source directory for the step.
- script_name
- str
The script name for the step.
- hash_paths
- list
The hash paths to use when determining the Module fingerprint.
Returns
The source directory and hash paths.
Return type
publish(description, execution_type, inputs, outputs, param_defs=None, create_sequencing_ports=True, version=None, is_default=False, content_path=None, hash_paths=None, category=None)
Create a ModuleVersion and add it to the current Module.
publish(description, execution_type, inputs, outputs, param_defs=None, create_sequencing_ports=True, version=None, is_default=False, content_path=None, hash_paths=None, category=None)
Parameters
- description
- str
The description of the Module.
- execution_type
- str
The execution type of the Module.
- inputs
- list
The Module inputs.
- outputs
- list
The Module outputs.
- create_sequencing_ports
- bool
Indicates whether sequencing ports will be created for the Module.
- is_default
- bool
Indicates whether the published version is to be the default one.
- hash_paths
- list
A list of paths to hash when checking for changes to the step contents. If there
are no changes detected, the pipeline will reuse the step contents from a previous run. By default, the
contents of the source_directory
are hashed (except files listed in .amlignore or .gitignore).
DEPRECATED: no longer needed.
Return type
publish_adla_script(script_name, description, inputs, outputs, params=None, create_sequencing_ports=True, degree_of_parallelism=None, priority=None, runtime_version=None, compute_target=None, version=None, is_default=False, source_directory=None, hash_paths=None, category=None)
Create a ModuleVersion based on Azure Data Lake Analytics (ADLA) and add it to the current Module.
publish_adla_script(script_name, description, inputs, outputs, params=None, create_sequencing_ports=True, degree_of_parallelism=None, priority=None, runtime_version=None, compute_target=None, version=None, is_default=False, source_directory=None, hash_paths=None, category=None)
Parameters
- script_name
- str
The name of an ADLA script, relative to source_directory
.
- description
- str
The description of the Module version.
- inputs
- list
The Module input bindings.
- outputs
- list
The Module output bindings.
- create_sequencing_ports
- bool
Indicates whether sequencing ports will be created for the Module.
- runtime_version
- str
The runtime version of the Azure Data Lake Analytics (ADLA) engine.
- is_default
- bool
Indicates whether the published version is to be the default one.
Return type
publish_azure_batch(description, compute_target, inputs, outputs, params=None, create_sequencing_ports=True, version=None, is_default=False, create_pool=False, pool_id=None, delete_batch_job_after_finish=False, delete_batch_pool_after_finish=False, is_positive_exit_code_failure=True, vm_image_urn='urn:MicrosoftWindowsServer:WindowsServer:2012-R2-Datacenter', run_task_as_admin=False, target_compute_nodes=1, vm_size='standard_d1_v2', executable=None, source_directory=None, category=None)
Create a ModuleVersion that uses Azure batch and add it to the current Module.
publish_azure_batch(description, compute_target, inputs, outputs, params=None, create_sequencing_ports=True, version=None, is_default=False, create_pool=False, pool_id=None, delete_batch_job_after_finish=False, delete_batch_pool_after_finish=False, is_positive_exit_code_failure=True, vm_image_urn='urn:MicrosoftWindowsServer:WindowsServer:2012-R2-Datacenter', run_task_as_admin=False, target_compute_nodes=1, vm_size='standard_d1_v2', executable=None, source_directory=None, category=None)
Parameters
- description
- str
The description of the Module version.
- compute_target
- BatchCompute, str
The BatchCompute compute target.
- inputs
- list
The Module input bindings.
- outputs
- list
The Module output bindings.
- create_sequencing_ports
- bool
Indicates whether sequencing ports will be created for the Module.
- is_default
- bool
Indicates whether the published version is to be the default one.
- delete_batch_job_after_finish
- bool
Indicates whether to delete the job from Batch account after it's finished.
- delete_batch_pool_after_finish
- bool
Indicates whether to delete the pool after the job finishes.
- is_positive_exit_code_failure
- bool
Indicates whether he job fails if the task exists with a positive code.
- vm_image_urn
- str
If create_pool
is True and VM uses VirtualMachineConfiguration, then this
parameter indicates the VM image to use. Value format: urn:publisher:offer:sku
.
Example: urn:MicrosoftWindowsServer:WindowsServer:2012-R2-Datacenter
.
- run_task_as_admin
- bool
Indicates whether the task should run with Admin privileges.
- target_compute_nodes
- int
If create_pool
is True, indicates how many compute nodes will be added
to the pool.
- vm_size
- str
If create_pool
is True, indicates the virtual machine size of the compute nodes.
- executable
- str
The name of the command/executable that will be executed as part of the job.
Return type
publish_python_script(script_name, description, inputs, outputs, params=None, create_sequencing_ports=True, version=None, is_default=False, source_directory=None, hash_paths=None, category=None)
Create a ModuleVersion that's based on a Python script and add it to the current Module.
publish_python_script(script_name, description, inputs, outputs, params=None, create_sequencing_ports=True, version=None, is_default=False, source_directory=None, hash_paths=None, category=None)
Parameters
- script_name
- str
The name of a Python script, relative to source_directory
.
- description
- str
The description of the Module version.
- inputs
- list
The Module input bindings.
- outputs
- list
The Module output bindings.
- create_sequencing_ports
- bool
Indicates whether sequencing ports will be created for the Module.
- is_default
- bool
Indicates whether the published version is to be the default one.
- hash_paths
- list
A list of paths to hash when checking for changes to the step contents. If there
are no changes detected, the pipeline will reuse the step contents from a previous run. By default the
contents of the source_directory
are hashed (except files listed in .amlignore or .gitignore).
DEPRECATED: no longer needed.
Return type
resolve(version=None)
Resolve and return the right ModuleVersion.
resolve(version=None)
Parameters
- version
Returns
The Module version to use.
Return type
set_default_version(version_id)
Set the default ModuleVersion of the Module.
set_default_version(version_id)
Parameters
- version_id
Returns
The default version.
Return type
set_description(description)
Set the description of Module.
set_description(description)
Parameters
- description
- str
The description to set.
set_name(name)
Attributes
default_version
description
id
name
status
Feedback
Loading feedback...