lag_lead_operator 模块

创建目标和功能的滞后和领先(消极滞后)。

LagLeadOperator

用于计算值滞后和领先的转换类。 适用于稀疏或非稀疏的常规时序数据集。 有两个滞后选项:“lag_by_time”或“lag_by_occurrence”。默认值:“lag_by_occurrence”。“lag_by_time”适用于(几乎)均匀间隔的时序数据,而“lag_by_occurrence”更适合非均匀间隔数据。此模块将通过检查数据稀疏性来选择合适的选项。

这将用作预测管道中的特征化步骤。

示例 1(均匀间隔的时序数据,将使用 Lag_By_Time):构造一个小的数据帧:


>>> raw_data = {'store': ['storeA'] * 3 + ['storeB'] * 4,
...             'date' : pd.to_datetime(
...                 ['2017-01-01', '2017-02-01', '2017-03-01'] * 2 +
...                 ['2017-04-01'] ),
...             'sales': range(8, 15)}
>>> tsds = TimeSeriesDataSet(
...    data=pd.DataFrame(raw_data),
...    time_series_id_column_names=['store'], time_column_name=['date'],
...    target_column_name='sales')
>>> tsds = tsdf.data.sort_index()
>>> tsds.data
                        sales
store      date
storeA     2017-01-01      8
           2017-02-01      9
           2017-03-01     10
storeB     2017-01-01     11
           2017-02-01     12
           2017-03-01     13
           2017-04-01     14
>>> tsds=MaxHorizonFeaturizer(1).fit_transform(tsds)
                                 sales  horizon_origin
store      date       origin
storeA     2017-01-01 2016-12-01     8               1
           2017-02-01 2017-01-01     9               1
           2017-03-01 2017-02-01    10               1
storeB     2017-01-01 2016-12-01    11               1
           2017-02-01 2017-01-01    12               1
           2017-03-01 2017-02-01    13               1
           2017-04-01 2017-03-01    14               1
>>> make_lags = LagLeadOperator(
...                 lags_to_construct={'sales': [-1, 1]})
>>> make_lags.fit(tsds)
>>> result = make_lags.transform(tsds)
>>> result.data
                                 sales  horizon_origin  sales_lead1  sales_lag1
store      date       origin
storeA     2017-01-01 2016-12-01     8               1         9.00         nan
           2017-02-01 2017-01-01     9               1        10.00        8.00
           2017-03-01 2017-02-01    10               1          nan        9.00
storeB     2017-01-01 2016-12-01    11               1        12.00         nan
           2017-02-01 2017-01-01    12               1        13.00       11.00
           2017-03-01 2017-02-01    13               1        14.00       12.00
           2017-04-01 2017-03-01    14               1          nan       13.00

示例 2(非均匀间隔的时序数据,将使用 Lag_By_Occurrence):构造一个小的 TimeSeriesDataSet:


>>> raw_data = {'store': ['storeA'] * 3 + ['storeB'] * 4,
...             'date' : pd.to_datetime(
...                 ['2017-01-01', '2017-02-01', '2017-04-01'] * 2 +
...                 ['2017-07-01'] ),
...             'sales': range(8, 15)}
>>> tsds = TimeSeriesDataSet(
...    data=pd.DataFrame(raw_data),
...    time_series_id_column_names=['store'], time_column_name=['date'],
...    target_column_name='sales')
>>> tsds = tsds.data.sort_index()
>>> tsds.data
                        sales
store      date
storeA     2017-01-01      8
           2017-02-01      9
           2017-04-01     10
storeB     2017-01-01     11
           2017-02-01     12
           2017-04-01     13
           2017-07-01     14
>>> tsds=MaxHorizonFeaturizer(1).fit_transform(tsds)
>>> tsds
                                 sales  horizon_origin
store      date       origin
storeA     2017-01-01 2016-12-01     8               1
           2017-02-01 2017-01-01     9               1
           2017-04-01 2017-03-01    10               1
storeB     2017-01-01 2016-12-01    11               1
           2017-02-01 2017-01-01    12               1
           2017-04-01 2017-03-01    13               1
           2017-07-01 2017-06-01    14               1
>>> make_lags = LagLeadOperator(
...                 lags_to_construct={'sales': [1]})
>>> make_lags.fit(tsds)
>>> result = make_lags.transform(tsds)
>>> result.data
                                 sales  horizon_origin   sales_occurrence_lag1  date_occurrence_lag1_timeDiffDays
store      date       origin
storeA     2017-01-01 2016-12-01     8               1                     nan                                nan
           2017-02-01 2017-01-01     9               1                    8.00                                 31
           2017-04-01 2017-03-01    10               1                    9.00                                 59
storeB     2017-01-01 2016-12-01    11               1                     nan                                nan
           2017-02-01 2017-01-01    12               1                   11.00                                 31
           2017-04-01 2017-03-01    13               1                   12.00                                 59
           2017-07-01 2017-06-01    14               1                   13.00                                 91