TimeSeriesTransformer Class
Class for timeseries preprocess.
Construct a TimeSeriesTransformer.
- Inheritance
-
azureml.training.tabular.featurization._azureml_transformer.AzureMLTransformerTimeSeriesTransformerazureml.training.tabular.featurization._featurization_info_provider.FeaturizationInfoProviderTimeSeriesTransformer
Constructor
TimeSeriesTransformer(pipeline: Pipeline, pipeline_type: TimeSeriesPipelineType, featurization_config: FeaturizationConfig, time_index_non_holiday_features: List[str], lookback_features_removed: bool, **kwargs: Any)
Parameters
Name | Description |
---|---|
pipeline_type
Required
|
<xref:FeaturizationConfig>
Type of pipeline to construct. Either Full or Reduced for CV split featurizing |
featurization_config
Required
|
The featurization config for customization. |
kwargs
Required
|
dictionary contains metadata for TimeSeries. time_column_name: The column containing dates. grain_column_names: The set of columns defining the multiple time series. origin_column_name: latest date from which actual values of all features are assumed to be known with certainty. drop_column_names: The columns which will needs to be removed from the data set. group: the group column name. |
pipeline
Required
|
|
time_index_non_holiday_features
Required
|
|
lookback_features_removed
Required
|
|
Methods
add_dummy_order_column |
Add the dummy order column to the pandas data frame. |
fit |
Fit the TimeSeriesTransformer. |
fit_transform |
Fit and transform data for a training scenario. Please note that there is no row data contract for the output DataFrame. That is, the output may have a different number and ordering of rows than the input. The steps here are:
|
get_auto_lag |
Return the heuristically inferred lag. If lags were not defined as auto, return None. ClientException is raised if fit was not called. :return: Heuristically defined target lag or None. :raises: ClientException |
get_auto_max_horizon |
Return auto max horizon. If max_horizon was not defined as auto, return None. :return: Heuristically defined max_horizon or None. :raises: ClientException |
get_auto_window_size |
Return the auto rolling window size. If rolling window was not defined as auto, return None. ClientException is raised if fit was not called. :return: Heuristically defined rolling window size or None. :raises: ClientException |
get_col_internal_type |
Get the internal type of a featured column. If it is a reserved column, return the column name. If it is a lag/rolling window column, return corresponding types defined in the transformer class. If it is a user input column, return other. |
get_engineered_feature_names |
Get the transformed column names. |
get_featurization_summary |
Return the featurization summary for all the input features seen by TimeSeriesTransformer. :param kwargs: See below |
get_json_strs_for_engineered_feature_names |
Return JSON string list for engineered feature names. |
get_params | |
get_target_lags |
Return target lags if any. |
get_target_rolling_window_size |
Return the size of rolling window. |
remove_rows_with_imputed_target | |
select_latest_origin_dates |
Select rows from X with latest origin times within time-grain groups. Logic: Group X by time, grain -> Find latest origin in each group -> Return row containing latest origin for each group. :param time_column_name: The time column name from data frame. :param time_series_id_column_names: The time series ID column names. :param origin_column_name: Origin time column name. :return: The data frame, containing only latest origins. |
transform |
Transform data for a scoring scenario. This transform has two different behaviors depending on whether y input is given - If y is not None, the output will contain the target quantity in the self.target_column_name column; this ensures that consumers of the transform can retrieve the target aligned to the transformed data. The transform will also fill time index gaps and impute missing target values when y is given. This behavior is usually best for in-sample scoring scenarios. If y is None, the output is just the transformed feature DataFrame and will not have time index gaps filled. This behavior is usually best for out-of-sample scoring scenarios. In either case, the output will contain the columns determined during fit/training and in the same order as that determined at fit/training. Please note that this method does not specify a contract for the rows of the output DataFrame. That is, the output may have a different number and ordering of rows than the input. The transform steps are:
|
add_dummy_order_column
Add the dummy order column to the pandas data frame.
add_dummy_order_column(X: DataFrame) -> None
Parameters
Name | Description |
---|---|
X
Required
|
The data frame which will undergo order column addition. |
fit
Fit the TimeSeriesTransformer.
fit(X: DataFrame, y: ndarray | None = None) -> TimeSeriesTransformer
Parameters
Name | Description |
---|---|
df
Required
|
Dataframe representing text, numerical or categorical input. |
y
Required
|
The target quantity. |
Returns
Type | Description |
---|---|
TimeSeriesTransformer |
fit_transform
Fit and transform data for a training scenario.
Please note that there is no row data contract for the output DataFrame. That is, the output may have a different number and ordering of rows than the input.
The steps here are:
- Fit the transformer: create the internal transform pipeline
- Common transform validation and preparation
- Fill datetime gaps and impute missing target values
- Call the internal pipeline's fit_transform method
- Common finalization
- Save the order of columns in the transformed data
fit_transform(df: DataFrame, y: ndarray) -> DataFrame
Parameters
Name | Description |
---|---|
df
Required
|
Dataframe representing text, numerical or categorical input. |
y
Required
|
The target quantity. |
Returns
Type | Description |
---|---|
pandas.DataFrame |
get_auto_lag
Return the heuristically inferred lag.
If lags were not defined as auto, return None. ClientException is raised if fit was not called. :return: Heuristically defined target lag or None. :raises: ClientException
get_auto_lag() -> List[int] | None
get_auto_max_horizon
Return auto max horizon.
If max_horizon was not defined as auto, return None. :return: Heuristically defined max_horizon or None. :raises: ClientException
get_auto_max_horizon() -> int | None
get_auto_window_size
Return the auto rolling window size.
If rolling window was not defined as auto, return None. ClientException is raised if fit was not called. :return: Heuristically defined rolling window size or None. :raises: ClientException
get_auto_window_size() -> int | None
get_col_internal_type
Get the internal type of a featured column. If it is a reserved column, return the column name. If it is a lag/rolling window column, return corresponding types defined in the transformer class. If it is a user input column, return other.
static get_col_internal_type(column_name: str) -> str
Parameters
Name | Description |
---|---|
column_name
Required
|
The column name. |
Returns
Type | Description |
---|---|
If a column is generated by AutoML SDK, it will return the corresponding SDK type. If not, it will return "other" |
get_engineered_feature_names
Get the transformed column names.
get_engineered_feature_names() -> List[str] | None
Returns
Type | Description |
---|---|
list of strings |
get_featurization_summary
Return the featurization summary for all the input features seen by TimeSeriesTransformer. :param kwargs:
See below
get_featurization_summary(**kwargs: Any) -> List[Dict[str, Any | None]]
Keyword-Only Parameters
Name | Description |
---|---|
is_user_friendly
|
If True, return individual transformer params as well, otherwise, only return the detailed featurization summary. |
Returns
Type | Description |
---|---|
List of featurization summary for each input feature. |
get_json_strs_for_engineered_feature_names
Return JSON string list for engineered feature names.
get_json_strs_for_engineered_feature_names(engi_feature_name_list: List[str] | None = None) -> List[str]
Parameters
Name | Description |
---|---|
engi_feature_name_list
|
Engineered feature names for whom JSON strings are required default value: None
|
Returns
Type | Description |
---|---|
JSON string list for engineered feature names |
get_params
get_params(deep=True)
Parameters
Name | Description |
---|---|
deep
|
default value: True
|
get_target_lags
Return target lags if any.
get_target_lags() -> List[int]
get_target_rolling_window_size
Return the size of rolling window.
get_target_rolling_window_size() -> int
remove_rows_with_imputed_target
remove_rows_with_imputed_target(X: DataFrame, y: ndarray) -> Tuple[DataFrame, ndarray]
Parameters
Name | Description |
---|---|
X
Required
|
|
y
Required
|
|
select_latest_origin_dates
Select rows from X with latest origin times within time-grain groups.
Logic: Group X by time, grain -> Find latest origin in each group -> Return row containing latest origin for each group. :param time_column_name: The time column name from data frame. :param time_series_id_column_names: The time series ID column names. :param origin_column_name: Origin time column name. :return: The data frame, containing only latest origins.
static select_latest_origin_dates(X: DataFrame, time_column_name: str, time_series_id_column_names: List[str], origin_column_name: str) -> DataFrame
Parameters
Name | Description |
---|---|
X
Required
|
|
time_column_name
Required
|
|
time_series_id_column_names
Required
|
|
origin_column_name
Required
|
|
transform
Transform data for a scoring scenario.
This transform has two different behaviors depending on whether y input is given -
If y is not None, the output will contain the target quantity in the self.target_column_name column; this ensures that consumers of the transform can retrieve the target aligned to the transformed data. The transform will also fill time index gaps and impute missing target values when y is given. This behavior is usually best for in-sample scoring scenarios.
If y is None, the output is just the transformed feature DataFrame and will not have time index gaps filled. This behavior is usually best for out-of-sample scoring scenarios.
In either case, the output will contain the columns determined during fit/training and in the same order as that determined at fit/training. Please note that this method does not specify a contract for the rows of the output DataFrame. That is, the output may have a different number and ordering of rows than the input.
The transform steps are:
- Common validation and preparation
- Remove rows that do not conform to the frequency determined during training
- If y input is given, append to the input, fill gaps and impute missing target values
- Infer the scoring data frequency and check that it is compatible with training frequency
- Call the internal pipeline's transform method
- Add an indicator column for missing target values
- Common finalization
- Restore column order determined during training
transform(df: DataFrame, y: ndarray | None = None) -> DataFrame
Parameters
Name | Description |
---|---|
df
Required
|
Dataframe representing text, numerical or categorical input. |
y
Required
|
The target quantity (optional). |
Returns
Type | Description |
---|---|
pandas.DataFrame |
Attributes
columns
has_unique_target_grains_dropper
lookback_features_removed
Returned true if lookback features were removed due to memory limitations.
max_horizon
Return the max horizon.
parameters
Return the parameters needed to reconstruct the time series transformer
target_imputation_marker_column_name
unique_target_grain_dropper
user_target_column_name
Get the target, or label, column name supplied by the user in AutoML configuration.
y_imputers
MISSING_Y
MISSING_Y = 'missing_y'
REMOVE_LAG_LEAD_WARN
REMOVE_LAG_LEAD_WARN = 'The lag-lead operator was removed due to memory limitation.'
REMOVE_ROLLING_WINDOW_WARN
REMOVE_ROLLING_WINDOW_WARN = 'The rolling window operator was removed due to memory limitation.'
SERIES_STATS_DICT
SERIES_STATS_DICT = 'series_stats_dict'
Feedback
https://aka.ms/ContentUserFeedback.
Coming soon: Throughout 2024 we will be phasing out GitHub Issues as the feedback mechanism for content and replacing it with a new feedback system. For more information see:Submit and view feedback for