PipelineOutputTabularDataset Class

Reference

Represent intermediate pipeline data promoted to an Azure Machine Learning Tabular Dataset.

Once an intermediate data is promoted to an Azure Machine Learning Dataset, it will also be consumed as a Dataset instead of a DataReference in subsequent steps.

Create an intermediate data that will be promoted to an Azure Machine Learning Dataset.

Inheritance: PipelineOutputAbstractDataset

PipelineOutputTabularDataset

Constructor

PipelineOutputTabularDataset(pipeline_output_dataset, additional_transformations)

Parameters

Name	Description
pipeline_output_dataset Required	PipelineOutputFileDataset The file dataset that represents the intermediate output which will be transformed to a tabular Dataset.
additional_transformations Required	<xref:azureml.dataprep.Dataflow> Additional transformations that will be applied on top of the file dataset.
pipeline_output_dataset Required	PipelineOutputFileDataset The file dataset that represents the intermediate output which will be transformed to a tabular Dataset.
additional_transformations Required	<xref:azureml.dataprep.Dataflow> Additional transformations that will be applied on top of the file dataset.

Methods

create_input_binding	Create an input binding.
drop_columns	Drop the specified columns from the dataset.
keep_columns	Keep the specified columns and drops all others from the dataset.
random_split	Split records in the dataset into two parts randomly and approximately by the percentage specified.

create_input_binding

Create an input binding.

create_input_binding()

Returns

Type	Description
InputPortBinding	The InputPortBinding with this PipelineData as the source.

drop_columns

Drop the specified columns from the dataset.

drop_columns(columns)

Parameters

Name	Description
columns Required	str or list[str] The name or a list of names for the columns to drop.

Returns

Type	Description
PipelineOutputTabularDataset	Returns a new intermediate data with only the specified columns dropped.

keep_columns

Keep the specified columns and drops all others from the dataset.

keep_columns(columns)

Parameters

Name	Description
columns Required	str or list[str] The name or a list of names for the columns to keep.

Returns

Type	Description
PipelineOutputTabularDataset	Returns a new intermediate data with only the specified columns kept.

random_split

Split records in the dataset into two parts randomly and approximately by the percentage specified.

random_split(percentage, seed=None)

Parameters

Name	Description
percentage Required	float The approximate percentage to split the dataset by. This must be a number between 0.0 and 1.0.
seed	int Optional seed to use for the random generator. default value: None

Returns

Type	Description
(TabularDataset, TabularDataset)	Returns a tuple of new TabularDataset objects representing the two datasets after the split.

PipelineOutputTabularDataset Class

Constructor

Parameters

Methods

create_input_binding

Returns

drop_columns

Parameters

Returns

keep_columns

Parameters

Returns

random_split

Parameters

Returns

Feedback

Feedback

Additional resources