core Package

Contains core functionality for Azure Machine Learning pipelines, which are configurable machine learning workflows.

Azure Machine Learning pipelines allow you to create resusable machine learning workflows that can be used as a template for your machine learning scenarios. This package contains the core functionality for working with Azure ML pipelines and is typically used along with the classes in the steps package.

A machine learning pipeline is represented by a collection of PipelineStep objects that can sequenced and parallelized, or be created with explicit dependencies between steps. Pipeline steps are used to define a Pipeline object which represents the workflow to execute. You can create and work with pipelines in a Jupyter Notebook or any other IDE with the Azure ML SDK installed.

Azure ML pipelines enable you to focus on machine learning rather than infrastructure. To get started building a pipeline, see https://aka.ms/pl-first-pipeline.

For more information about the benefits of the Machine Learning Pipeline and how it is related to other pipelines offered by Azure, see What are ML pipelines in Azure Machine Learning service?

Modules

builder

Defines classes for building a Azure Machine Learning pipeline.

A pipeline graph is composed of pipeline steps (PipelineStep), optional pipeline data (PipelineData) produced or consumed in each step, and an optional step execution sequence (StepSequence).

graph

Defines classes for constructing Azure Machine Learning pipeline graphs.

Azure ML pipeline graphs are created for Pipeline objects, when you use PipelineStep (and derived classes), PipelineData, and PipelineData objects. In typical use cases, you will not need to directly use the classes in this module.

A pipeline run graph consists of module nodes which represent basic units such as a datasource or step. Nodes can have input ports and output ports, and associated parameters. Edges define relationships between two node ports in a graph.

module

Contains classes for creating and managing resusable computational units of an Azure Machine Learning pipeline.

Modules allow you to create computational units in a Pipeline, which can have inputs, outputs, and rely on parameters and an environment configuration to operate. A module can be versioned and be used in different Azure Machine Learning pipelines unlike PipelineStep (and derived classes) which are used in one Pipeline.

Modules are designed to be reused in several pipelines and can evolve to adapt a specific computation logic for different use cases. A step in a pipeline can be used in fast iterations to improve an algorithm, and once the goal is achieved, the algorithm is usually published as a module to enable reuse.

module_step_base

Contains functionality to add a step to a pipeline using a version of a Module.

pipeline

Defines the class for creating reusable Azure Machine Learning workflows.

pipeline_draft

Defines classes for managing mutable pipelines.

pipeline_endpoint

Defines classes for managing pipelines including versioning and endpoints.

pipeline_output_dataset

Contains functionality for promoting an intermediate output to an Azure Machine Learning Dataset.

Intermediate data (output) in a pipeline by default will not become an Azure Machine Learning Dataset. To promote intermediate data to an Azure Machine Learning Dataset, call the as_dataset method on the PipelineData class to return a PipelineOutputFileDataset object. From a PipelineOutputFileDataset object, you can then create an PipelineOutputTabularDataset object.

run

Defines classes for submitted pipelines, including classes for checking status and retrieving run details.

schedule

Defines classes for scheduling submissions of Azure Machine Learning Pipelines.

Classes

InputPortBinding

Defines a binding from a source to an input of a pipeline step.

An InputPortBinding can be used as an input to a step. The source can be a PipelineData, PortDataReference, DataReference, PipelineDataset, or OutputPortBinding.

InputPortBinding is useful to specify the name of the step input, if it should be different than the name of the bind object (i.e. to avoid duplicate input/output names or because the step script needs an input to have a certain name). It can also be used to specify the bind_mode for PythonScriptStep inputs.

Initialize InputPortBinding.

Module

Represents a computation unit used in an Azure Machine Learning pipeline.

A module is a collection of files which will run on a compute target and a description of an interface. The collection of files can be script, binaries, or any other files required to execute on the compute target. The module interface describes inputs, outputs, and parameter definitions. It doesn't bind them to specific values or data. A module has a snapshot associated with it, which captures the collection of files defined for the module.

Initialize Module.

ModuleVersion

Represents the actual computation unit within a Module.

You should not use this class directly. Instead, use one of the publish methods of the Module class.

Initialize ModuleVersion.

ModuleVersionDescriptor

Defines the version and ID of a ModuleVersion.

Initialize ModuleVersionDescriptor.

OutputPortBinding

Defines a named output of a pipeline step.

OutputPortBinding can be used to specify the type of data which will be produced by a step and how the data will be produced. It can be used with InputPortBinding to specify that the step output is a required input of another step.

Initialize OutputPortBinding.

Pipeline

Represents a collection of steps which can be executed as a reusable Azure Machine Learning workflow.

Use a Pipeline to create and manage workflows that stitch together various machine learning phases. Each machine learning phase, such as data preparation and model training, can consist of one or more steps in a Pipeline.

For an overview of why and when to use Pipelines, see https://aka.ms/pl-concept.

For an overview on constructing a Pipeline, see https://aka.ms/pl-first-pipeline.

Initialize Pipeline.

PipelineData

Represents intermediate data in an Azure Machine Learning pipeline.

Data used in pipeline can be produced by one step and consumed in another step by providing a PipelineData object as an output of one step and an input of one or more subsequent steps.

Note if you are using the pipeline data, please make sure the directory used existed.

A python example to ensure the directory existed, suppose you have a output port named output_folder in one pipeline step, you want to write some data to relative path in this folder.


   import os
   os.makedirs(args.output_folder, exist_ok=True)
   f = open(args.output_folder + '/relative_path/file_name', 'w+')

PipelineData use DataReference underlying which is no longer the recommended approach for data access and delivery, please use OutputFileDatasetConfig instead, you can find sample here: Pipeline using OutputFileDatasetConfig.

Initialize PipelineData.

PipelineDataset

Acts as an adapter for Dataset and Pipeline.

Note

This class is deprecated. Learn how to use dataset with pipeline, see https://aka.ms/pipeline-with-dataset.

This is an internal class. You should not create this class directly but rather call the as_* instance methods on the Dataset or the OutputDatasetConfig classes.

Act as an adapter for Dataset and Pipeline.

This is an internal class. You should not create this class directly but rather call the as_* instance methods on the Dataset or the OutputDatasetConfig classes.

PipelineDraft

Represents a mutable pipeline which can be used to submit runs and create Published Pipelines.

Use PipelineDrafts to iterate on Pipelines. PipelineDrafts can be created from scratch, another PipelineDraft, or existing pipelines: Pipeline, PublishedPipeline, or PipelineRun.

Initialize PipelineDraft.

PipelineEndpoint

Represents a Pipeline workflow that can be triggered from a unique endpoint URL.

PipelineEndpoints can be used to create new versions of a PublishedPipeline while maintaining the same endpoint. PipelineEndpoints are uniquely named within a workspace.

Using the endpoint attribute of a PipelineEndpoint object, you can trigger new pipeline runs from external applications with REST calls. For information about how to authenticate when calling REST endpoints, see https://aka.ms/pl-restep-auth.

For more information about creating and running machine learning pipelines, see https://aka.ms/pl-first-pipeline.

Initialize PipelineEndpoint.

PipelineParameter

Defines a parameter in a pipeline execution.

Use PipelineParameters to construct versatile Pipelines which can be resubmitted later with varying parameter values.

Initialize pipeline parameters.

PipelineRun

Represents a run of a Pipeline.

This class can be used to manage, check status, and retrieve run details once a pipeline run is submitted. Use get_steps to retrieve the StepRun objects which are created by the pipeline run. Other uses include retrieving the Graph object associated with the pipeline run, fetching the status of the pipeline run, and waiting for run completion.

Initialize a Pipeline run.

PipelineStep

Represents an execution step in an Azure Machine Learning pipeline.

Pipelines are constructed from multiple pipeline steps, which are distinct computational units in the pipeline. Each step can run independently and use isolated compute resources. Each step typically has its own named inputs, outputs, and parameters.

The PipelineStep class is the base class from which other built-in step classes designed for common scenarios inherit, such as PythonScriptStep, DataTransferStep, and HyperDriveStep.

For an overview of how Pipelines and PipelineSteps relate, see What are ML Pipelines.

Initialize PipelineStep.

PortDataReference

Models data associated with an output of a completed StepRun.

A PortDataReference object can be used to download the output data which was produced by a StepRun. It can also be used as an step input in a future Pipeline.

Initialize PortDataReference.

PublishedPipeline

Represents a Pipeline to be submitted without the Python code which constructed it.

In addition, a PublishedPipeline can be used to resubmit a Pipeline with different PipelineParameter values and inputs.

Initialize PublishedPipeline.

:param endpoint The REST endpoint URL to submit pipeline runs for this pipeline. :type endpoint: str :param total_run_steps: The number of steps in this pipeline :type total_run_steps: int :param workspace: The workspace of the published pipeline. :type workspace: azureml.core.Workspace :param continue_on_step_failure: Whether to continue execution of other steps in the PipelineRun

if a step fails, default is false.

Schedule

Defines a schedule on which to submit a pipeline.

Once a Pipeline is published, a Schedule can be used to submit the Pipeline at a specified interval or when changes to a Blob storage location are detected.

Initialize Schedule.

ScheduleRecurrence

Defines the frequency, interval and start time of a pipeline Schedule.

ScheduleRecurrence also allows you to specify the time zone and the hours or minutes or week days for the recurrence.

Initialize a schedule recurrence.

It also allows to specify the time zone and the hours or minutes or week days for the recurrence.

StepRun

A run of a step in a Pipeline.

This class can be used to manage, check status, and retrieve run details once the parent pipeline run is submitted and the pipeline has submitted the step run.

Initialize a StepRun.

StepRunOutput

Represents an output created by a StepRun in a Pipeline.

StepRunOutput can be used to access the PortDataReference created by the step.

Initialize StepRunOutput.

StepSequence

Represents a list of steps in a Pipeline and the order in which to execute them.

Use a StepSequence when initializing a pipeline to create a workflow that contains steps to run in a specific order.

Initialize StepSequence.

TrainingOutput

Defines a specialized output of certain PipelineSteps for use in a pipeline.

TrainingOutput enables an automated machine learning metric or model to be made available as a step output to be consumed by another step in an Azure Machine Learning Pipeline. Can be used with AutoMLStep or HyperDriveStep.

Initialize TrainingOutput.

param model_file: The specific model file to be included in the output. For HyperDriveStep only.

Enums

TimeZone

Enumerates the valid time zones for a recurrence Schedule.