TimeSeriesTransformer Class

Class for timeseries preprocess.

Construct a TimeSeriesTransformer.

Inheritance
azureml.training.tabular.featurization._azureml_transformer.AzureMLTransformer
TimeSeriesTransformer
azureml.training.tabular.featurization._featurization_info_provider.FeaturizationInfoProvider
TimeSeriesTransformer

Constructor

TimeSeriesTransformer(pipeline: Pipeline, pipeline_type: TimeSeriesPipelineType, featurization_config: FeaturizationConfig, time_index_non_holiday_features: List[str], lookback_features_removed: bool, **kwargs: Any)

Parameters

Name Description
pipeline_type
Required
<xref:FeaturizationConfig>

Type of pipeline to construct. Either Full or Reduced for CV split featurizing

featurization_config
Required

The featurization config for customization.

kwargs
Required

dictionary contains metadata for TimeSeries. time_column_name: The column containing dates. grain_column_names: The set of columns defining the multiple time series. origin_column_name: latest date from which actual values of all features are assumed to be known with certainty. drop_column_names: The columns which will needs to be removed from the data set. group: the group column name.

pipeline
Required
time_index_non_holiday_features
Required
lookback_features_removed
Required

Methods

add_dummy_order_column

Add the dummy order column to the pandas data frame.

fit

Fit the TimeSeriesTransformer.

fit_transform

Fit and transform data for a training scenario.

Please note that there is no row data contract for the output DataFrame. That is, the output may have a different number and ordering of rows than the input.

The steps here are:

  1. Fit the transformer: create the internal transform pipeline
  2. Common transform validation and preparation
  3. Fill datetime gaps and impute missing target values
  4. Call the internal pipeline's fit_transform method
  5. Common finalization
  6. Save the order of columns in the transformed data
get_auto_lag

Return the heuristically inferred lag.

If lags were not defined as auto, return None. ClientException is raised if fit was not called. :return: Heuristically defined target lag or None. :raises: ClientException

get_auto_max_horizon

Return auto max horizon.

If max_horizon was not defined as auto, return None. :return: Heuristically defined max_horizon or None. :raises: ClientException

get_auto_window_size

Return the auto rolling window size.

If rolling window was not defined as auto, return None. ClientException is raised if fit was not called. :return: Heuristically defined rolling window size or None. :raises: ClientException

get_col_internal_type

Get the internal type of a featured column. If it is a reserved column, return the column name. If it is a lag/rolling window column, return corresponding types defined in the transformer class. If it is a user input column, return other.

get_engineered_feature_names

Get the transformed column names.

get_featurization_summary

Return the featurization summary for all the input features seen by TimeSeriesTransformer. :param kwargs:

See below

get_json_strs_for_engineered_feature_names

Return JSON string list for engineered feature names.

get_params
get_target_lags

Return target lags if any.

get_target_rolling_window_size

Return the size of rolling window.

remove_rows_with_imputed_target
select_latest_origin_dates

Select rows from X with latest origin times within time-grain groups.

Logic: Group X by time, grain -> Find latest origin in each group -> Return row containing latest origin for each group. :param time_column_name: The time column name from data frame. :param time_series_id_column_names: The time series ID column names. :param origin_column_name: Origin time column name. :return: The data frame, containing only latest origins.

transform

Transform data for a scoring scenario.

This transform has two different behaviors depending on whether y input is given -

If y is not None, the output will contain the target quantity in the self.target_column_name column; this ensures that consumers of the transform can retrieve the target aligned to the transformed data. The transform will also fill time index gaps and impute missing target values when y is given. This behavior is usually best for in-sample scoring scenarios.

If y is None, the output is just the transformed feature DataFrame and will not have time index gaps filled. This behavior is usually best for out-of-sample scoring scenarios.

In either case, the output will contain the columns determined during fit/training and in the same order as that determined at fit/training. Please note that this method does not specify a contract for the rows of the output DataFrame. That is, the output may have a different number and ordering of rows than the input.

The transform steps are:

  1. Common validation and preparation
  2. Remove rows that do not conform to the frequency determined during training
  3. If y input is given, append to the input, fill gaps and impute missing target values
  4. Infer the scoring data frequency and check that it is compatible with training frequency
  5. Call the internal pipeline's transform method
  6. Add an indicator column for missing target values
  7. Common finalization
  8. Restore column order determined during training

add_dummy_order_column

Add the dummy order column to the pandas data frame.

add_dummy_order_column(X: DataFrame) -> None

Parameters

Name Description
X
Required

The data frame which will undergo order column addition.

fit

Fit the TimeSeriesTransformer.

fit(X: DataFrame, y: ndarray | None = None) -> TimeSeriesTransformer

Parameters

Name Description
df
Required

Dataframe representing text, numerical or categorical input.

y
Required

The target quantity.

Returns

Type Description

TimeSeriesTransformer

fit_transform

Fit and transform data for a training scenario.

Please note that there is no row data contract for the output DataFrame. That is, the output may have a different number and ordering of rows than the input.

The steps here are:

  1. Fit the transformer: create the internal transform pipeline
  2. Common transform validation and preparation
  3. Fill datetime gaps and impute missing target values
  4. Call the internal pipeline's fit_transform method
  5. Common finalization
  6. Save the order of columns in the transformed data
fit_transform(df: DataFrame, y: ndarray) -> DataFrame

Parameters

Name Description
df
Required

Dataframe representing text, numerical or categorical input.

y
Required

The target quantity.

Returns

Type Description

pandas.DataFrame

get_auto_lag

Return the heuristically inferred lag.

If lags were not defined as auto, return None. ClientException is raised if fit was not called. :return: Heuristically defined target lag or None. :raises: ClientException

get_auto_lag() -> List[int] | None

get_auto_max_horizon

Return auto max horizon.

If max_horizon was not defined as auto, return None. :return: Heuristically defined max_horizon or None. :raises: ClientException

get_auto_max_horizon() -> int | None

get_auto_window_size

Return the auto rolling window size.

If rolling window was not defined as auto, return None. ClientException is raised if fit was not called. :return: Heuristically defined rolling window size or None. :raises: ClientException

get_auto_window_size() -> int | None

get_col_internal_type

Get the internal type of a featured column. If it is a reserved column, return the column name. If it is a lag/rolling window column, return corresponding types defined in the transformer class. If it is a user input column, return other.

static get_col_internal_type(column_name: str) -> str

Parameters

Name Description
column_name
Required

The column name.

Returns

Type Description

If a column is generated by AutoML SDK, it will return the corresponding SDK type. If not, it will return "other"

get_engineered_feature_names

Get the transformed column names.

get_engineered_feature_names() -> List[str] | None

Returns

Type Description

list of strings

get_featurization_summary

Return the featurization summary for all the input features seen by TimeSeriesTransformer. :param kwargs:

See below

get_featurization_summary(**kwargs: Any) -> List[Dict[str, Any | None]]

Keyword-Only Parameters

Name Description
is_user_friendly

If True, return individual transformer params as well, otherwise, only return the detailed featurization summary.

Returns

Type Description

List of featurization summary for each input feature.

get_json_strs_for_engineered_feature_names

Return JSON string list for engineered feature names.

get_json_strs_for_engineered_feature_names(engi_feature_name_list: List[str] | None = None) -> List[str]

Parameters

Name Description
engi_feature_name_list

Engineered feature names for whom JSON strings are required

default value: None

Returns

Type Description

JSON string list for engineered feature names

get_params

get_params(deep=True)

Parameters

Name Description
deep
default value: True

get_target_lags

Return target lags if any.

get_target_lags() -> List[int]

get_target_rolling_window_size

Return the size of rolling window.

get_target_rolling_window_size() -> int

remove_rows_with_imputed_target

remove_rows_with_imputed_target(X: DataFrame, y: ndarray) -> Tuple[DataFrame, ndarray]

Parameters

Name Description
X
Required
y
Required

select_latest_origin_dates

Select rows from X with latest origin times within time-grain groups.

Logic: Group X by time, grain -> Find latest origin in each group -> Return row containing latest origin for each group. :param time_column_name: The time column name from data frame. :param time_series_id_column_names: The time series ID column names. :param origin_column_name: Origin time column name. :return: The data frame, containing only latest origins.

static select_latest_origin_dates(X: DataFrame, time_column_name: str, time_series_id_column_names: List[str], origin_column_name: str) -> DataFrame

Parameters

Name Description
X
Required
time_column_name
Required
time_series_id_column_names
Required
origin_column_name
Required

transform

Transform data for a scoring scenario.

This transform has two different behaviors depending on whether y input is given -

If y is not None, the output will contain the target quantity in the self.target_column_name column; this ensures that consumers of the transform can retrieve the target aligned to the transformed data. The transform will also fill time index gaps and impute missing target values when y is given. This behavior is usually best for in-sample scoring scenarios.

If y is None, the output is just the transformed feature DataFrame and will not have time index gaps filled. This behavior is usually best for out-of-sample scoring scenarios.

In either case, the output will contain the columns determined during fit/training and in the same order as that determined at fit/training. Please note that this method does not specify a contract for the rows of the output DataFrame. That is, the output may have a different number and ordering of rows than the input.

The transform steps are:

  1. Common validation and preparation
  2. Remove rows that do not conform to the frequency determined during training
  3. If y input is given, append to the input, fill gaps and impute missing target values
  4. Infer the scoring data frequency and check that it is compatible with training frequency
  5. Call the internal pipeline's transform method
  6. Add an indicator column for missing target values
  7. Common finalization
  8. Restore column order determined during training
transform(df: DataFrame, y: ndarray | None = None) -> DataFrame

Parameters

Name Description
df
Required

Dataframe representing text, numerical or categorical input.

y
Required

The target quantity (optional).

Returns

Type Description

pandas.DataFrame

Attributes

columns

Return the list of expected columns.

Returns

Type Description

The list of columns.

has_unique_target_grains_dropper

lookback_features_removed

Returned true if lookback features were removed due to memory limitations.

max_horizon

Return the max horizon.

parameters

Return the parameters needed to reconstruct the time series transformer

target_imputation_marker_column_name

unique_target_grain_dropper

user_target_column_name

Get the target, or label, column name supplied by the user in AutoML configuration.

y_imputers

Return the imputer for target column.

Returns

Type Description

imputer for target column.

MISSING_Y

MISSING_Y = 'missing_y'

REMOVE_LAG_LEAD_WARN

REMOVE_LAG_LEAD_WARN = 'The lag-lead operator was removed due to memory limitation.'

REMOVE_ROLLING_WINDOW_WARN

REMOVE_ROLLING_WINDOW_WARN = 'The rolling window operator was removed due to memory limitation.'

SERIES_STATS_DICT

SERIES_STATS_DICT = 'series_stats_dict'