TimeSeriesTransformer Class

Reference

Class for timeseries preprocess.

Construct a TimeSeriesTransformer.

Inheritance: azureml.training.tabular.featurization._azureml_transformer.AzureMLTransformer

TimeSeriesTransformer

azureml.training.tabular.featurization._featurization_info_provider.FeaturizationInfoProvider

TimeSeriesTransformer

Constructor

TimeSeriesTransformer(pipeline: Pipeline, pipeline_type: TimeSeriesPipelineType, featurization_config: FeaturizationConfig, time_index_non_holiday_features: List[str], lookback_features_removed: bool, **kwargs: Any)

Parameters

Name	Description
pipeline_type Required	<xref:FeaturizationConfig> Type of pipeline to construct. Either Full or Reduced for CV split featurizing
featurization_config Required	The featurization config for customization.
kwargs Required	dict dictionary contains metadata for TimeSeries. time_column_name: The column containing dates. grain_column_names: The set of columns defining the multiple time series. origin_column_name: latest date from which actual values of all features are assumed to be known with certainty. drop_column_names: The columns which will needs to be removed from the data set. group: the group column name.
pipeline Required
time_index_non_holiday_features Required
lookback_features_removed Required

Methods

add_dummy_order_column	Add the dummy order column to the pandas data frame.
fit	Fit the TimeSeriesTransformer.
fit_transform	Fit and transform data for a training scenario. Please note that there is no row data contract for the output DataFrame. That is, the output may have a different number and ordering of rows than the input. The steps here are: Fit the transformer: create the internal transform pipeline Common transform validation and preparation Fill datetime gaps and impute missing target values Call the internal pipeline's fit_transform method Common finalization Save the order of columns in the transformed data
get_auto_lag	Return the heuristically inferred lag. If lags were not defined as auto, return None. ClientException is raised if fit was not called. :return: Heuristically defined target lag or None. :raises: ClientException
get_auto_max_horizon	Return auto max horizon. If max_horizon was not defined as auto, return None. :return: Heuristically defined max_horizon or None. :raises: ClientException
get_auto_window_size	Return the auto rolling window size. If rolling window was not defined as auto, return None. ClientException is raised if fit was not called. :return: Heuristically defined rolling window size or None. :raises: ClientException
get_col_internal_type	Get the internal type of a featured column. If it is a reserved column, return the column name. If it is a lag/rolling window column, return corresponding types defined in the transformer class. If it is a user input column, return other.
get_engineered_feature_names	Get the transformed column names.
get_featurization_summary	Return the featurization summary for all the input features seen by TimeSeriesTransformer. :param kwargs: See below
get_json_strs_for_engineered_feature_names	Return JSON string list for engineered feature names.
get_params
get_target_lags	Return target lags if any.
get_target_rolling_window_size	Return the size of rolling window.
remove_rows_with_imputed_target
select_latest_origin_dates	Select rows from X with latest origin times within time-grain groups. Logic: Group X by time, grain -> Find latest origin in each group -> Return row containing latest origin for each group. :param time_column_name: The time column name from data frame. :param time_series_id_column_names: The time series ID column names. :param origin_column_name: Origin time column name. :return: The data frame, containing only latest origins.
transform	Transform data for a scoring scenario. This transform has two different behaviors depending on whether y input is given - If y is not None, the output will contain the target quantity in the self.target_column_name column; this ensures that consumers of the transform can retrieve the target aligned to the transformed data. The transform will also fill time index gaps and impute missing target values when y is given. This behavior is usually best for in-sample scoring scenarios. If y is None, the output is just the transformed feature DataFrame and will not have time index gaps filled. This behavior is usually best for out-of-sample scoring scenarios. In either case, the output will contain the columns determined during fit/training and in the same order as that determined at fit/training. Please note that this method does not specify a contract for the rows of the output DataFrame. That is, the output may have a different number and ordering of rows than the input. The transform steps are: Common validation and preparation Remove rows that do not conform to the frequency determined during training If y input is given, append to the input, fill gaps and impute missing target values Infer the scoring data frequency and check that it is compatible with training frequency Call the internal pipeline's transform method Add an indicator column for missing target values Common finalization Restore column order determined during training

add_dummy_order_column

Add the dummy order column to the pandas data frame.

add_dummy_order_column(X: DataFrame) -> None

Parameters

Name	Description
X Required	The data frame which will undergo order column addition.

fit

Fit the TimeSeriesTransformer.

fit(X: DataFrame, y: ndarray | None = None) -> TimeSeriesTransformer

Parameters

Name	Description
df Required	DataFrame Dataframe representing text, numerical or categorical input.
y Required	ndarray The target quantity.

Returns

Type	Description
	TimeSeriesTransformer

fit_transform

Fit and transform data for a training scenario.

Please note that there is no row data contract for the output DataFrame. That is, the output may have a different number and ordering of rows than the input.

The steps here are:

Fit the transformer: create the internal transform pipeline
Common transform validation and preparation
Fill datetime gaps and impute missing target values
Call the internal pipeline's fit_transform method
Common finalization
Save the order of columns in the transformed data

fit_transform(df: DataFrame, y: ndarray) -> DataFrame

Parameters

Name	Description
df Required	DataFrame Dataframe representing text, numerical or categorical input.
y Required	ndarray The target quantity.

Returns

Type	Description
	pandas.DataFrame

get_auto_lag

Return the heuristically inferred lag.

If lags were not defined as auto, return None. ClientException is raised if fit was not called. :return: Heuristically defined target lag or None. :raises: ClientException

get_auto_lag() -> List[int] | None

get_auto_max_horizon

Return auto max horizon.

If max_horizon was not defined as auto, return None. :return: Heuristically defined max_horizon or None. :raises: ClientException

get_auto_max_horizon() -> int | None

get_auto_window_size

Return the auto rolling window size.

If rolling window was not defined as auto, return None. ClientException is raised if fit was not called. :return: Heuristically defined rolling window size or None. :raises: ClientException

get_auto_window_size() -> int | None

get_col_internal_type

Get the internal type of a featured column. If it is a reserved column, return the column name. If it is a lag/rolling window column, return corresponding types defined in the transformer class. If it is a user input column, return other.

static get_col_internal_type(column_name: str) -> str

Parameters

Name	Description
column_name Required	The column name.

Returns

Type	Description
	If a column is generated by AutoML SDK, it will return the corresponding SDK type. If not, it will return "other"

get_engineered_feature_names

Get the transformed column names.

get_engineered_feature_names() -> List[str] | None

Returns

Type	Description
	list of strings

get_featurization_summary

Return the featurization summary for all the input features seen by TimeSeriesTransformer. :param kwargs:

See below

get_featurization_summary(**kwargs: Any) -> List[Dict[str, Any | None]]

Keyword-Only Parameters

Name	Description
is_user_friendly	If True, return individual transformer params as well, otherwise, only return the detailed featurization summary.

Returns

Type	Description
	List of featurization summary for each input feature.

get_json_strs_for_engineered_feature_names

Return JSON string list for engineered feature names.

get_json_strs_for_engineered_feature_names(engi_feature_name_list: List[str] | None = None) -> List[str]

Parameters

Name	Description
engi_feature_name_list	Engineered feature names for whom JSON strings are required default value: None

Returns

Type	Description
	JSON string list for engineered feature names

get_params

get_params(deep=True)

Parameters

Name	Description
deep	default value: True

get_target_lags

Return target lags if any.

get_target_lags() -> List[int]

get_target_rolling_window_size

Return the size of rolling window.

get_target_rolling_window_size() -> int

remove_rows_with_imputed_target

remove_rows_with_imputed_target(X: DataFrame, y: ndarray) -> Tuple[DataFrame, ndarray]

Parameters

Name	Description
X Required
y Required

select_latest_origin_dates

Select rows from X with latest origin times within time-grain groups.

Logic: Group X by time, grain -> Find latest origin in each group -> Return row containing latest origin for each group. :param time_column_name: The time column name from data frame. :param time_series_id_column_names: The time series ID column names. :param origin_column_name: Origin time column name. :return: The data frame, containing only latest origins.

static select_latest_origin_dates(X: DataFrame, time_column_name: str, time_series_id_column_names: List[str], origin_column_name: str) -> DataFrame

Parameters

Name	Description
X Required
time_column_name Required
time_series_id_column_names Required
origin_column_name Required

transform

Transform data for a scoring scenario.

This transform has two different behaviors depending on whether y input is given -

If y is not None, the output will contain the target quantity in the self.target_column_name column; this ensures that consumers of the transform can retrieve the target aligned to the transformed data. The transform will also fill time index gaps and impute missing target values when y is given. This behavior is usually best for in-sample scoring scenarios.

If y is None, the output is just the transformed feature DataFrame and will not have time index gaps filled. This behavior is usually best for out-of-sample scoring scenarios.

In either case, the output will contain the columns determined during fit/training and in the same order as that determined at fit/training. Please note that this method does not specify a contract for the rows of the output DataFrame. That is, the output may have a different number and ordering of rows than the input.

The transform steps are:

Common validation and preparation
Remove rows that do not conform to the frequency determined during training
If y input is given, append to the input, fill gaps and impute missing target values
Infer the scoring data frequency and check that it is compatible with training frequency
Call the internal pipeline's transform method
Add an indicator column for missing target values
Common finalization
Restore column order determined during training

transform(df: DataFrame, y: ndarray | None = None) -> DataFrame

Parameters

Name	Description
df Required	DataFrame Dataframe representing text, numerical or categorical input.
y Required	ndarray The target quantity (optional).

Returns

Type	Description
	pandas.DataFrame

Attributes

columns

Return the list of expected columns.

Returns

Type	Description
list	The list of columns.

has_unique_target_grains_dropper

lookback_features_removed

Returned true if lookback features were removed due to memory limitations.

max_horizon

Return the max horizon.

parameters

Return the parameters needed to reconstruct the time series transformer

target_imputation_marker_column_name

unique_target_grain_dropper

user_target_column_name

Get the target, or label, column name supplied by the user in AutoML configuration.

y_imputers

Return the imputer for target column.

Returns

Type	Description
dict	imputer for target column.

MISSING_Y

MISSING_Y = 'missing_y'

REMOVE_LAG_LEAD_WARN

REMOVE_LAG_LEAD_WARN = 'The lag-lead operator was removed due to memory limitation.'

REMOVE_ROLLING_WINDOW_WARN

REMOVE_ROLLING_WINDOW_WARN = 'The rolling window operator was removed due to memory limitation.'

SERIES_STATS_DICT

SERIES_STATS_DICT = 'series_stats_dict'

TimeSeriesTransformer Class

Constructor

Parameters

Methods

add_dummy_order_column

Parameters

fit

Parameters

Returns

fit_transform

Parameters

Returns

get_auto_lag

get_auto_max_horizon

get_auto_window_size

get_col_internal_type

Parameters

Returns

get_engineered_feature_names

Returns

get_featurization_summary

Keyword-Only Parameters

Returns

get_json_strs_for_engineered_feature_names

Parameters

Returns

get_params

Parameters

get_target_lags

get_target_rolling_window_size

remove_rows_with_imputed_target

Parameters

select_latest_origin_dates

Parameters

transform

Parameters

Returns

Attributes

columns

Returns

has_unique_target_grains_dropper

lookback_features_removed

max_horizon

parameters

target_imputation_marker_column_name

unique_target_grain_dropper

user_target_column_name

y_imputers

Returns

MISSING_Y

REMOVE_LAG_LEAD_WARN

REMOVE_ROLLING_WINDOW_WARN

SERIES_STATS_DICT

Feedback

Feedback

Additional resources