PipelineDataset class

Definition

Models data associated with an input of a StepRun that comes from a Dataset.

By default, the name of the dataset, the definition version, and the snapshot name (if snapshot is used) will be used as the name for the input. You can override the name with this class.

PipelineDataset(dataset=None, name=None, bind_mode='mount', path_on_compute=None, overwrite=False, parameter_name=None)
Inheritance
builtins.object
PipelineDataset

Parameters

dataset
Dataset or DatasetDefinition

The dataset that will be used as the input to the step.

name
str

The name of the input in the pipeline.

bind_mode
str

How the dataset should be made available, either mount or download.

path_on_compute
str

The path on the compute where the data will be made available.

overwrite
bool

Whether to overwrite existing data or not.

parameter_name
str

The parameter name of the dataset. This is used for published pipeline.

Remarks

Use PipelineDataset when constructing a Pipeline to specify that the input of a step is a Dataset.

An example to use a PipelineDataset as a step input is as follows:


   from azureml.core import Dataset
   from azureml.pipeline.core import Pipeline
   from azureml.pipeline.steps import PythonScriptStep

   dataset = Dataset.get_by_name(ws, "<dataset_name>")

   step = PythonScriptStep(
       name='train step',
       script_name="train.py",
       compute_target=compute,
       arguments=["--input", dataset.as_named_input(<input_name>).as_mount],
       inputs=[dataset]
   )

   pipeline = Pipeline(workspace=ws, steps=[step_1])

Methods

create(dataset, name=None, parameter_name=None)

Create a PipelineDataset from an Azure Machine Learning Dataset.

default_name(dataset)

Get the default port name of a dataset/dataset definition.

is_dataset(dset)

Determine whether the input is a dataset or a dataset definition.

validate_dataset(dset)

Validate the state of the dataset.

It will log a warning if the dataset is deprecated and throws an error if the datasaet is archived.

create(dataset, name=None, parameter_name=None)

Create a PipelineDataset from an Azure Machine Learning Dataset.

create(dataset, name=None, parameter_name=None)

Parameters

dataset
Dataset or DatasetDefinition or DatasetConsumptionConfig or PipelineDataset

The dataset to create the PipelineDataset from.

name
str

The name of the input dataset. If None, a name will be derived based on the type of the input.

default value: None
parameter_name
str

The pipeline parameter name.

default value: None

Returns

The created PipelineDataset.

Return type

default_name(dataset)

Get the default port name of a dataset/dataset definition.

default_name(dataset)

Parameters

dataset
object

The dataset to calculate the name from.

Returns

The name.

Return type

str

is_dataset(dset)

Determine whether the input is a dataset or a dataset definition.

is_dataset(dset)

Parameters

dset
object

The input.

Returns

Whether input is a dataset or a dataset definition.

Return type

validate_dataset(dset)

Validate the state of the dataset.

It will log a warning if the dataset is deprecated and throws an error if the datasaet is archived.

validate_dataset(dset)

Parameters

dset
Dataset or DatasetDefinition or DatasetConsumptionConfig

The dataset to be verified.

Attributes

bind_mode

Get how the dataset should be made available.

Returns

The bind mode.

Return type

str

dataset

Get the dataset this input is binded to.

Returns

The dataset.

Return type

dataset_id

Get the dataset ID.

Returns

The dataset ID.

Return type

str

dataset_version

Get the dataset definition's version.

Returns

The dataset version.

Return type

str

name

Get the name of the input.

Returns

The name.

Return type

str

overwrite

Get value indicating whether to overwrite existing data.

Returns

Overwrite or not.

Return type

parameter_name

Get the pipeline parameter name of this pipeline dataset.

Returns

The parameter name.

Return type

str

path_on_compute

Get the path where the data will be made available on the compute.

Returns

The path on compute.

Return type

str

saved_dataset_id

Return the saved ID of the dataset in the PipelineDataset.

Returns

The saved ID of the dataset.

Return type

str

workspace

Get the workspace the dataset belongs to.

Returns

The workspace.

Return type