SplitColumnByExampleBuilder class

Definition

Interactive object that can be used to learn program for splitting a column based into a set of columns based on provided examples.

SplitColumnByExampleBuilder(dataflow: azureml.dataprep.api.dataflow.Dataflow, engine_api: azureml.dataprep.api.engineapi.api.EngineAPI, source_column: str, keep_delimiters: bool = False, delimiters: typing.List[str] = None)
Inheritance
builtins.object
SplitColumnByExampleBuilder

Methods

add_example(example: SplitExample) -> None

Adds an example value that will be used when learning a program to split the column.

Note

If an identical example is already present, this will do nothing.

If a conflicting example is given (identical source but different results), an exception will be raised.

delete_example(example_index: int)

Deletes example, so it's no longer considered in program generation.

generate_suggested_examples

List examples that, if provided, would improve confidence in the generated program.

Note

This operation will internally make a pull on the data in order to generate suggestions.

learn() -> None

Learn program that splits source_column into multiple columns based on delimiters or examples provided.

list_examples

Gets examples that are currently used to generate a program to split a column.

preview

Preview result of the generated program.

to_dataflow() -> azureml.dataprep.api.dataflow.Dataflow

Uses the program learned based on the provided examples to derive a new column and create a new dataflow.

add_example(example: SplitExample) -> None

Adds an example value that will be used when learning a program to split the column.

Note

If an identical example is already present, this will do nothing.

If a conflicting example is given (identical source but different results), an exception will be raised.

add_example(example: SplitExample) -> None

Parameters

example

Tuple of source value and list of intended splits. Source value could be provided as a string or a key value pair with source column as a key.

delete_example(example_index: int)

Deletes example, so it's no longer considered in program generation.

delete_example(example_index: int)

Parameters

example_index

index of example to delete.

generate_suggested_examples

List examples that, if provided, would improve confidence in the generated program.

Note

This operation will internally make a pull on the data in order to generate suggestions.

Returns

pandas.DataFrame of suggested examples.

Return type

learn() -> None

Learn program that splits source_column into multiple columns based on delimiters or examples provided.

learn() -> None

Remarks

After calling this function an attempt will be made to generate a program that satisfies all the provided constraints. Raises ValueError if the program can't be generated.

list_examples

Gets examples that are currently used to generate a program to split a column.

Returns

pandas.DataFrame with examples.

Return type

preview

Preview result of the generated program.

Parameters

skip

Number of rows to skip. Allows you to move preview window forward. Default is 0.

count

Number of rows to preview. Default is 10.

Returns

pandas.DataFrame with preview data.

Return type

Remarks

Returned DataFrame consists of the source column used by the program and all generated splits.

to_dataflow() -> azureml.dataprep.api.dataflow.Dataflow

Uses the program learned based on the provided examples to derive a new column and create a new dataflow.

to_dataflow() -> azureml.dataprep.api.dataflow.Dataflow

Returns

A new Dataflow with a derived column.

Attributes

delimiters

One of the options for generating a split program is to provide a list of delimiters that should be used.

Returns

If delimiters were provided, returns them.

keep_delimiters

Controls whether columns with delimiters should be kept in resulting data.