PipelineOutputTabularDataset Class

Represent intermediate pipeline data promoted to an Azure Machine Learning Tabular Dataset.

Once an intermediate data is promoted to an Azure Machine Learning Dataset, it will also be consumed as a Dataset instead of a DataReference in subsequent steps.

Create an intermediate data that will be promoted to an Azure Machine Learning Dataset.

Inheritance
PipelineOutputTabularDataset

Constructor

PipelineOutputTabularDataset(pipeline_output_dataset, additional_transformations)

Parameters

Name Description
pipeline_output_dataset
Required

The file dataset that represents the intermediate output which will be transformed to a tabular Dataset.

additional_transformations
Required
<xref:azureml.dataprep.Dataflow>

Additional transformations that will be applied on top of the file dataset.

pipeline_output_dataset
Required

The file dataset that represents the intermediate output which will be transformed to a tabular Dataset.

additional_transformations
Required
<xref:azureml.dataprep.Dataflow>

Additional transformations that will be applied on top of the file dataset.

Methods

create_input_binding

Create an input binding.

drop_columns

Drop the specified columns from the dataset.

keep_columns

Keep the specified columns and drops all others from the dataset.

random_split

Split records in the dataset into two parts randomly and approximately by the percentage specified.

create_input_binding

Create an input binding.

create_input_binding()

Returns

Type Description

The InputPortBinding with this PipelineData as the source.

drop_columns

Drop the specified columns from the dataset.

drop_columns(columns)

Parameters

Name Description
columns
Required
str or list[str]

The name or a list of names for the columns to drop.

Returns

Type Description

Returns a new intermediate data with only the specified columns dropped.

keep_columns

Keep the specified columns and drops all others from the dataset.

keep_columns(columns)

Parameters

Name Description
columns
Required
str or list[str]

The name or a list of names for the columns to keep.

Returns

Type Description

Returns a new intermediate data with only the specified columns kept.

random_split

Split records in the dataset into two parts randomly and approximately by the percentage specified.

random_split(percentage, seed=None)

Parameters

Name Description
percentage
Required

The approximate percentage to split the dataset by. This must be a number between 0.0 and 1.0.

seed
int

Optional seed to use for the random generator.

default value: None

Returns

Type Description

Returns a tuple of new TabularDataset objects representing the two datasets after the split.