DeriveColumnByExampleBuilder class

Definition

Interactive object that can be used to learn program for deriving a column based on a set of source columns and examples.

DeriveColumnByExampleBuilder(dataflow: azureml.dataprep.api.dataflow.Dataflow, engine_api: azureml.dataprep.api.engineapi.api.EngineAPI, source_columns: typing.List[str], new_column_name: str)
Inheritance
builtins.object
DeriveColumnByExampleBuilder

Methods

add_example(source_data: SourceData, example_value: str) -> None

Adds an example value that will be used when learning a program to derive the new column.

delete_example

Deletes example, so it's no longer considered in program generation.

Note

Can be used with either full example row from list_examples() result or just example_id.

generate_suggested_examples

List examples that, if provided, would improve confidence in the generated program.

Note

This operation will internally make a pull on the data in order to generate suggestions.

learn() -> None

Learn program that adds a new column in which values satisfy constrain set by source data and examples provided.

list_examples

Gets examples that are currently used to generate a program to derive a column.

preview

Preview result of the generated program.

to_dataflow() -> azureml.dataprep.api.dataflow.Dataflow

Uses the program learned based on the provided examples to derive a new column and create a new dataflow.

add_example(source_data: SourceData, example_value: str) -> None

Adds an example value that will be used when learning a program to derive the new column.

add_example(source_data: SourceData, example_value: str) -> None

Parameters

source_data

Source data for the provided example. Generally should be a Dict[str, str] or pandas.Series where key of dictionary or index of series are column names and values are corresponding column values. Easiest way to provide source_data is to pass in a specific row of pandas.DataFrame (eg. df.iloc[2])

example_value

Desired result for the provided source data.

Remarks

If an identical example is already present, this will do nothing. If a conflicting example is given (identical source_data but different example_value), an exception

will be raised.

delete_example

Deletes example, so it's no longer considered in program generation.

Note

Can be used with either full example row from list_examples() result or just example_id.

Parameters

example_id

Id of example to delete.

example_row

Example row to delete.

generate_suggested_examples

List examples that, if provided, would improve confidence in the generated program.

Note

This operation will internally make a pull on the data in order to generate suggestions.

Returns

pandas.DataFrame of suggested examples.

Return type

learn() -> None

Learn program that adds a new column in which values satisfy constrain set by source data and examples provided.

learn() -> None

Remarks

Calling this function will trigger an attempt to generate a program that satisfies all the provided constraints (examples).

list_examples

Gets examples that are currently used to generate a program to derive a column.

Returns

pandas.DataFrame with examples.

Return type

preview

Preview result of the generated program.

Parameters

skip

Number of rows to skip. Allows you to move preview window forward. Default is 0.

count

Number of rows to preview. Default is 10.

Returns

pandas.DataFrame with preview data.

Return type

Remarks

Returned DataFrame consists of all the source columns used by the program as well as the derived column.

to_dataflow() -> azureml.dataprep.api.dataflow.Dataflow

Uses the program learned based on the provided examples to derive a new column and create a new dataflow.

to_dataflow() -> azureml.dataprep.api.dataflow.Dataflow

Returns

A new Dataflow with a derived column.