JoinBuilder class

Definition

An interactive object that can be used to help join two Dataflows.

Note

This builder has the ability to detect and suggest potential join arguments. In some cases, the builder can derive the join key in one of the Dataflows and use the derived key column to perform a join.

JoinBuilder(engine_api: azureml.dataprep.api.engineapi.api.EngineAPI, left_dataflow: DataflowReference, right_dataflow: DataflowReference, join_key_pairs: typing.List[typing.Tuple[str, str]] = None, join_type: azureml.dataprep.api.engineapi.typedefinitions.JoinType = <JoinType.MATCH: 2>, left_column_prefix: str = 'l_', right_column_prefix: str = 'r_')
Inheritance
builtins.object
JoinBuilder

Methods

apply_suggestion(suggestion_index: int) -> None

Applies the join suggestion's parameters to builder's arguments.

detect_column_info() -> None

This performs a pull on provided Dataflows to automatically set left and right column prefixes and non-prefixed columns.

generate_suggested_join() -> None

This pulls the data from both left and right Dataflows to analyze it and try to come up with potential join arguments based on it.

list_join_suggestions() -> str

Suggested join variants.

preview

Preview of the join result.

to_dataflow()

Uses current state of the builder to create a new Dataflow by joining two provided Dataflows.

apply_suggestion(suggestion_index: int) -> None

Applies the join suggestion's parameters to builder's arguments.

apply_suggestion(suggestion_index: int) -> None

Parameters

suggestion_index

Index of join suggestion to apply.

detect_column_info() -> None

This performs a pull on provided Dataflows to automatically set left and right column prefixes and non-prefixed columns.

detect_column_info() -> None

generate_suggested_join() -> None

This pulls the data from both left and right Dataflows to analyze it and try to come up with potential join arguments based on it.

generate_suggested_join() -> None

Remarks

The resulting join suggestion could either use existing columns in the provided Dataflows or could generate a key column derived from an existing column in one of the Dataflows. For instance, if one Dataflow has a column 'Full name' with values like 'Smith, John' and the other Dataflow has columns 'First Name' and 'Last Name' with values like 'John', 'Smith' the join suggestion might be to derive a new column in the second Dataflow (called KEY_GENERATED{_n}) by concatenating 'Last Name' and 'First Name' with a comma in between and then use the derived column in the right Dataflow and 'Full Name' column in the left Dataflow to perform a join.

list_join_suggestions() -> str

Suggested join variants.

list_join_suggestions() -> str

preview

Preview of the join result.

Parameters

skip

Number of rows to skip. Allows you to move preview window forward. Default is 0.

count

Number of rows to preview. Default is 10.

count

Number of rows to preview. Default is 10.

Returns

pandas.DataFrame with preview data.

Return type

to_dataflow()

Uses current state of the builder to create a new Dataflow by joining two provided Dataflows.

to_dataflow()

Returns

New Dataflow.

Attributes

join_key_pairs

List of join key pairs represented as a list of tuples where the first value is a column name from left Dataflow and second value is a column name from right Dataflow.

join_type

Type of join to perform.

left_column_prefix

Prefix to use on all columns from left dataflow.

right_column_prefix

Prefix to use on all columns from right dataflow.