Aggregator class

Definition

Defines an aggregation against specified columns identified with join keys.

Inheritance
builtins.object
Aggregator

Remarks

Aggregators are typically not instantiated directly. Instead, specify the the type of aggregator when using using an enricher such as the HolidayEnricher object.

Derived aggregators include AggregatorAll, AggregatorAvg, AggregatorMax, AggregatorMin, AggregatorTop.

The process(env, customer_data, public_data, join_keys, debug) method performs the aggregation.

Methods

get_log_property()

Get log property tuple, None if no property.

process(env: typing.Union[azureml.opendatasets.environ.SparkEnv, azureml.opendatasets.environ.PandasEnv], customer_data: azureml.opendatasets.accessories.customer_data.CustomerData, public_data: azureml.opendatasets.accessories.public_data.PublicData, join_keys: list, debug: bool)

Left join customer_data with public_data on join_keys.

Drop all columns in join_keys and all columns which is in the list of to_be_cleaned_up_column_names afterward.

process_public_dataset(env: azureml.opendatasets.environ.RuntimeEnv, _public_dataset: object, cols: typing.Union[typing.List[str], NoneType], join_keys: typing.List[typing.Tuple[str, str]] = []) -> object

Perform aggregation on specified public data columns.

get_log_property()

Get log property tuple, None if no property.

get_log_property()

process(env: typing.Union[azureml.opendatasets.environ.SparkEnv, azureml.opendatasets.environ.PandasEnv], customer_data: azureml.opendatasets.accessories.customer_data.CustomerData, public_data: azureml.opendatasets.accessories.public_data.PublicData, join_keys: list, debug: bool)

Left join customer_data with public_data on join_keys.

Drop all columns in join_keys and all columns which is in the list of to_be_cleaned_up_column_names afterward.

process(env: typing.Union[azureml.opendatasets.environ.SparkEnv, azureml.opendatasets.environ.PandasEnv], customer_data: azureml.opendatasets.accessories.customer_data.CustomerData, public_data: azureml.opendatasets.accessories.public_data.PublicData, join_keys: list, debug: bool)

Parameters

customer_data
<xref:azureml.opendatasets.accessories.customer_data.CustomerData>

The customer data.

public_data
<xref:azureml.opendatasets.accessories.public_data.PublicData>

The public data.

join_keys
list[tuple]

A list of join key pairs.

debug
bool

Indicates whether to print debug info.

Returns

A tuple of ( a new instance of class CustomerData, unchanged instance of PublicData, a new joined instance of class CustomerData, join keys (list of tuple))

Return type

Tuple[CustomerData,PublicData,CustomerData,List[Tuple[str, str]]]

process_public_dataset(env: azureml.opendatasets.environ.RuntimeEnv, _public_dataset: object, cols: typing.Union[typing.List[str], NoneType], join_keys: typing.List[typing.Tuple[str, str]] = []) -> object

Perform aggregation on specified public data columns.

process_public_dataset(env: azureml.opendatasets.environ.RuntimeEnv, _public_dataset: object, cols: typing.Union[typing.List[str], NoneType], join_keys: typing.List[typing.Tuple[str, str]] = []) -> object

Parameters

_public_dataset
DataFrame

A public dataset dataframe.

cols
list

A list of column names to retrieve.

join_keys
list

A list of join keys to use.

Returns

A new DataFrame of the public dataset.

Return type

Attributes

should_direct_join

should_direct_join = True