|
LagLeadOperator
|
用于计算值滞后和领先的转换类。
适用于稀疏或非稀疏的常规时序数据集。
有两个滞后选项:“lag_by_time”或“lag_by_occurrence”。默认值:“lag_by_occurrence”。“lag_by_time”适用于(几乎)均匀间隔的时序数据,而“lag_by_occurrence”更适合非均匀间隔数据。此模块将通过检查数据稀疏性来选择合适的选项。
这将用作预测管道中的特征化步骤。
示例 1(均匀间隔的时序数据,将使用 Lag_By_Time):构造一个小的数据帧:
>>> raw_data = {'store': ['storeA'] * 3 + ['storeB'] * 4,
... 'date' : pd.to_datetime(
... ['2017-01-01', '2017-02-01', '2017-03-01'] * 2 +
... ['2017-04-01'] ),
... 'sales': range(8, 15)}
>>> tsds = TimeSeriesDataSet(
... data=pd.DataFrame(raw_data),
... time_series_id_column_names=['store'], time_column_name=['date'],
... target_column_name='sales')
>>> tsds = tsdf.data.sort_index()
>>> tsds.data
sales
store date
storeA 2017-01-01 8
2017-02-01 9
2017-03-01 10
storeB 2017-01-01 11
2017-02-01 12
2017-03-01 13
2017-04-01 14
>>> tsds=MaxHorizonFeaturizer(1).fit_transform(tsds)
sales horizon_origin
store date origin
storeA 2017-01-01 2016-12-01 8 1
2017-02-01 2017-01-01 9 1
2017-03-01 2017-02-01 10 1
storeB 2017-01-01 2016-12-01 11 1
2017-02-01 2017-01-01 12 1
2017-03-01 2017-02-01 13 1
2017-04-01 2017-03-01 14 1
>>> make_lags = LagLeadOperator(
... lags_to_construct={'sales': [-1, 1]})
>>> make_lags.fit(tsds)
>>> result = make_lags.transform(tsds)
>>> result.data
sales horizon_origin sales_lead1 sales_lag1
store date origin
storeA 2017-01-01 2016-12-01 8 1 9.00 nan
2017-02-01 2017-01-01 9 1 10.00 8.00
2017-03-01 2017-02-01 10 1 nan 9.00
storeB 2017-01-01 2016-12-01 11 1 12.00 nan
2017-02-01 2017-01-01 12 1 13.00 11.00
2017-03-01 2017-02-01 13 1 14.00 12.00
2017-04-01 2017-03-01 14 1 nan 13.00
示例 2(非均匀间隔的时序数据,将使用 Lag_By_Occurrence):构造一个小的 TimeSeriesDataSet:
>>> raw_data = {'store': ['storeA'] * 3 + ['storeB'] * 4,
... 'date' : pd.to_datetime(
... ['2017-01-01', '2017-02-01', '2017-04-01'] * 2 +
... ['2017-07-01'] ),
... 'sales': range(8, 15)}
>>> tsds = TimeSeriesDataSet(
... data=pd.DataFrame(raw_data),
... time_series_id_column_names=['store'], time_column_name=['date'],
... target_column_name='sales')
>>> tsds = tsds.data.sort_index()
>>> tsds.data
sales
store date
storeA 2017-01-01 8
2017-02-01 9
2017-04-01 10
storeB 2017-01-01 11
2017-02-01 12
2017-04-01 13
2017-07-01 14
>>> tsds=MaxHorizonFeaturizer(1).fit_transform(tsds)
>>> tsds
sales horizon_origin
store date origin
storeA 2017-01-01 2016-12-01 8 1
2017-02-01 2017-01-01 9 1
2017-04-01 2017-03-01 10 1
storeB 2017-01-01 2016-12-01 11 1
2017-02-01 2017-01-01 12 1
2017-04-01 2017-03-01 13 1
2017-07-01 2017-06-01 14 1
>>> make_lags = LagLeadOperator(
... lags_to_construct={'sales': [1]})
>>> make_lags.fit(tsds)
>>> result = make_lags.transform(tsds)
>>> result.data
sales horizon_origin sales_occurrence_lag1 date_occurrence_lag1_timeDiffDays
store date origin
storeA 2017-01-01 2016-12-01 8 1 nan nan
2017-02-01 2017-01-01 9 1 8.00 31
2017-04-01 2017-03-01 10 1 9.00 59
storeB 2017-01-01 2016-12-01 11 1 nan nan
2017-02-01 2017-01-01 12 1 11.00 31
2017-04-01 2017-03-01 13 1 12.00 59
2017-07-01 2017-06-01 14 1 13.00 91
|