將時間序列預測模型自動定型Auto-train a time-series forecast model

在本文中,您將瞭解如何在 Azure Machine Learning PYTHON SDK中使用自動化機器學習服務(AutoML)來設定和定型時間序列預測回歸模型。In this article, you learn how to configure and train a time-series forecasting regression model using automated machine learning, AutoML, in the Azure Machine Learning Python SDK.

若要這樣做,您可以:To do so, you:

  • 準備資料以進行時間序列模型。Prepare data for time series modeling.
  • 在物件中設定特定的時間序列參數 AutoMLConfigConfigure specific time-series parameters in an AutoMLConfig object.
  • 使用時間序列資料執行預測。Run predictions with time-series data.

如需低程式碼體驗,請參閱教學課程:使用自動化機器學習來預測需求,以取得在 Azure Machine Learning 工作室中使用自動化機器學習的時間序列預測範例。For a low code experience, see the Tutorial: Forecast demand with automated machine learning for a time-series forecasting example using automated machine learning in the Azure Machine Learning studio.

不同于傳統的時間序列方法,在自動化 ML 中,過去的時間序列值會「轉換」為回歸輸入變數與其他預測指標的額外維度。Unlike classical time series methods, in automated ML, past time-series values are "pivoted" to become additional dimensions for the regressor together with other predictors. 這個方法會在定型期間結合多個內容變數及其相互關聯性。This approach incorporates multiple contextual variables and their relationship to one another during training. 由於有多個因素可能會影響預測,因此此方法非常吻合真實世界的預測案例。Since multiple factors can influence a forecast, this method aligns itself well with real world forecasting scenarios. 例如,在預測銷售額時,歷程記錄趨勢、匯率和價格的互動,全都共同推動銷售結果。For example, when forecasting sales, interactions of historical trends, exchange rate, and price all jointly drive the sales outcome.

PrerequisitesPrerequisites

針對本文,您需要For this article you need,

  • Azure Machine Learning 工作區。An Azure Machine Learning workspace. 若要建立工作區,請參閱建立 Azure Machine Learning 工作區To create the workspace, see Create an Azure Machine Learning workspace.

  • 本文假設您已熟悉如何設定自動化機器學習實驗。This article assumes some familiarity with setting up an automated machine learning experiment. 遵循 教學 課程或操作 說明 ,查看主要的自動化機器學習實驗設計模式。Follow the tutorial or how-to to see the main automated machine learning experiment design patterns.

準備資料Preparing data

在 AutoML 中,預測回歸工作類型與回歸工作類型之間最重要的差異,在於您資料中的一項功能,代表有效的時間序列。The most important difference between a forecasting regression task type and regression task type within AutoML is including a feature in your data that represents a valid time series. 一般時間序列具有妥善定義且一致的頻率,且在連續時間範圍內的每個取樣點都有一個值。A regular time series has a well-defined and consistent frequency and has a value at every sample point in a continuous time span.

請考慮 sample.csv 檔案的下列快照集。Consider the following snapshot of a file sample.csv. 此資料集是具有兩個不同商店、A 和 B 之公司的每日銷售資料。This data set is of daily sales data for a company that has two different stores, A, and B.

此外,還有一些功能Additionally, there are features for

  • week_of_year:可讓模型偵測每週季節性。week_of_year: allows the model to detect weekly seasonality.
  • day_datetime:表示具有每日頻率的清除時間序列。day_datetime: represents a clean time series with daily frequency.
  • sales_quantity:執行預測的目標資料行。sales_quantity: the target column for running predictions.
day_datetime,store,sales_quantity,week_of_year
9/3/2018,A,2000,36
9/3/2018,B,600,36
9/4/2018,A,2300,36
9/4/2018,B,550,36
9/5/2018,A,2100,36
9/5/2018,B,650,36
9/6/2018,A,2400,36
9/6/2018,B,700,36
9/7/2018,A,2450,36
9/7/2018,B,650,36

請將資料讀取至 Pandas 資料框架,然後使用 to_datetime 函式來確保時間序列是 datetime 類型。Read the data into a Pandas dataframe, then use the to_datetime function to ensure the time series is a datetime type.

import pandas as pd
data = pd.read_csv("sample.csv")
data["day_datetime"] = pd.to_datetime(data["day_datetime"])

在此情況下,資料已經依時間欄位 day_datetime 遞增排序。In this case, the data is already sorted ascending by the time field day_datetime. 不過,在設定實驗時,請確定所需的時間資料行是以遞增順序排序,以建置有效的時間序列。However, when setting up an experiment, ensure the desired time column is sorted in ascending order to build a valid time series.

下列程式碼:The following code,

  • 假設資料包含1000記錄,並在資料中進行決定性分割,以建立定型和測試資料集。Assumes the data contains 1,000 records, and makes a deterministic split in the data to create training and test data sets.
  • 將標籤資料行識別為 sales_quantityIdentifies the label column as sales_quantity.
  • 分隔標籤欄位 test_data 以形成 test_target 集合。Separates the label field from test_data to form the test_target set.
train_data = data.iloc[:950]
test_data = data.iloc[-50:]

label =  "sales_quantity"
 
test_labels = test_data.pop(label).values

重要

將模型定型以預測未來值時,請確定在針對想要的範圍執行預測時,可使用定型中使用的所有特徵。When training a model for forecasting future values, ensure all the features used in training can be used when running predictions for your intended horizon.

例如,建立需求預測時,包括目前股價的特徵可能會大幅增加定型準確度。For example, when creating a demand forecast, including a feature for current stock price could massively increase training accuracy. 不過,如果想要預測較長範圍的情況,則可能無法精確地預測與未來時間序列點對應的未來股價值,且模型精確度可能會受到影響。However, if you intend to forecast with a long horizon, you may not be able to accurately predict future stock values corresponding to future time-series points, and model accuracy could suffer.

訓練和驗證資料Training and validation data

您可以直接在物件中指定個別的定型和驗證集 AutoMLConfigYou can specify separate train and validation sets directly in the AutoMLConfig object. 深入了解 AutoMLConfigLearn more about the AutoMLConfig.

針對時間序列預測,根據預設,只有輪流 來源交叉驗證 (ROCV) 用於驗證。For time series forecasting, only Rolling Origin Cross Validation (ROCV) is used for validation by default. 將定型和驗證資料一起傳遞,並使用中的參數設定交叉驗證折迭數目 n_cross_validations AutoMLConfigPass the training and validation data together, and set the number of cross validation folds with the n_cross_validations parameter in your AutoMLConfig. ROCV 會使用原始時間點來將序列分割成定型和驗證資料。ROCV divides the series into training and validation data using an origin time point. 滑動時間原點即會產生交叉驗證摺疊。Sliding the origin in time generates the cross-validation folds. 此策略可保留時間序列資料的完整性,並消除資料洩漏的風險This strategy preserves the time series data integrity and eliminates the risk of data leakage

滾動原始來源交叉驗證

您也可以攜帶自己的驗證資料,深入瞭解在 AutoML 中設定資料分割和交叉驗證You can also bring your own validation data, learn more in Configure data splits and cross-validation in AutoML.

automl_config = AutoMLConfig(task='forecasting',
                             n_cross_validations=3,
                             ...
                             **time_series_settings)

深入瞭解 AutoML 如何套用交叉驗證,以 防止過度調整的模型Learn more about how AutoML applies cross validation to prevent over-fitting models.

設定實驗Configure experiment

AutoMLConfig 物件會定義自動化機器學習工作所需的設定和資料。The AutoMLConfig object defines the settings and data necessary for an automated machine learning task. 預測模型的設定類似于設定標準回歸模型,但特定的模型、設定選項和特徵化步驟則特別存在於時間序列資料中。Configuration for a forecasting model is similar to the setup of a standard regression model, but certain models, configuration options, and featurization steps exist specifically for time-series data.

支援的模型Supported models

自動化機器學習會在建立和微調程式的過程中,自動嘗試不同的模型和演算法。Automated machine learning automatically tries different models and algorithms as part of the model creation and tuning process. 身為使用者,您不需要指定演算法。As a user, there is no need for you to specify the algorithm. 針對預測實驗,原生時間序列和深度學習模型都屬於建議系統的一部分。For forecasting experiments, both native time-series and deep learning models are part of the recommendation system. 下表摘要說明此模型子集。The following table summarizes this subset of models.

提示

傳統的回歸模型也會測試為建議系統的一部分,以進行預測實驗。Traditional regression models are also tested as part of the recommendation system for forecasting experiments. 如需完整的模型清單,請參閱 支援的模型資料表See the supported model table for the full list of models.

模型Models 描述Description 優點Benefits
Prophet (預覽)Prophet (Preview) Prophet 最適合用於具有強烈季節性影響,且包含數個季節歷程記錄資料的時間序列。Prophet works best with time series that have strong seasonal effects and several seasons of historical data. 若要利用此模型,請使用將它安裝在本機 pip install fbprophetTo leverage this model, install it locally using pip install fbprophet. 精確且快速,能夠應付時間序列中的極端值、遺失資料及重大變更。Accurate & fast, robust to outliers, missing data, and dramatic changes in your time series.
自動 ARIMA (預覽)Auto-ARIMA (Preview) 自動回歸整合式移動平均 (ARIMA) 在資料為固定的情況下執行效果最佳。Auto-Regressive Integrated Moving Average (ARIMA) performs best, when the data is stationary. 這表示其統計屬性 (如平均值和變異數) 在整個集合上是常數。This means that its statistical properties like the mean and variance are constant over the entire set. 比方說,如果您翻轉了一個硬幣,那麼您遇到的機率就是50%,無論您今天、明天或下一年。For example, if you flip a coin, then the probability of you getting heads is 50%, regardless if you flip today, tomorrow, or next year. 由於過去值是用來預測未來值,因此非常適用於單一變量序列。Great for univariate series, since the past values are used to predict the future values.
ForecastTCN (預覽)ForecastTCN (Preview) ForecastTCN 是一種神經網路模型,其設計目的是要處理最嚴苛的預測工作,並擷取資料中的非線性本機和全球趨勢,以及時間序列之間的關聯性。ForecastTCN is a neural network model designed to tackle the most demanding forecasting tasks, capturing nonlinear local and global trends in your data as well as relationships between time series. 能夠運用資料中的複雜趨勢,並配合最大的資料集立即調整。Capable of leveraging complex trends in your data and readily scales to the largest of datasets.

組態設定Configuration settings

類似於迴歸問題,您可定義標準定型參數,例如工作類型、反覆項目數目、定型資料,以及交叉驗證的數目。Similar to a regression problem, you define standard training parameters like task type, number of iterations, training data, and number of cross-validations. 針對預測工作,還有一些必須設定的參數會影響實驗。For forecasting tasks, there are additional parameters that must be set that affect the experiment.

下表摘要說明這些額外的參數。The following table summarizes these additional parameters. 請參閱語法設計模式的 ForecastingParameter 類別參考檔See the ForecastingParameter class reference documentation for syntax design patterns.

參數名稱 Parameter name 描述Description 必要Required
time_column_name 用來指定輸入資料中用來建置時間序列並推斷其頻率的日期時間資料行。Used to specify the datetime column in the input data used for building the time series and inferring its frequency.
forecast_horizon 定義您想要預測的期間數。Defines how many periods forward you would like to forecast. 範圍是以時間序列頻率為單位。The horizon is in units of the time series frequency. 單位是以預測器應預測出的定型資料時間間隔為基礎,例如,每月、每週。Units are based on the time interval of your training data, for example, monthly, weekly that the forecaster should predict out.
enable_dnn 啟用預測 dnnEnable Forecasting DNNs.
time_series_id_column_names 資料行名稱 (s) 用來唯一識別資料中有多個資料列具有相同時間戳記的時間序列。The column name(s) used to uniquely identify the time series in data that has multiple rows with the same timestamp. 如果未定義時間序列識別碼,則會假設資料集為一個時間序列。If time series identifiers are not defined, the data set is assumed to be one time-series. 若要深入了解單一時間序列,請參閱 energy_demand_notebookTo learn more about single time-series, see the energy_demand_notebook.
freq 時間序列資料集頻率。The time series dataset frequency. 此參數代表預期發生事件的期間,例如每日、每週、每年等等。頻率必須是 pandas 位移別名This parameter represents the period with which events are expected to occur, such as daily, weekly, yearly, etc. The frequency must be a pandas offset alias.
target_lags 要根據資料頻率延隔目標值的資料列數目。Number of rows to lag the target values based on the frequency of the data. 延隔會以清單或單一整數來表示。The lag is represented as a list or single integer. 當獨立變數與相依變數之間的關聯性預設不相符或相互關聯時,應該使用延隔。Lag should be used when the relationship between the independent variables and dependent variable doesn't match up or correlate by default.
feature_lags target_lags 設定且設定為時,自動化 ML 會自動決定延隔的功能 feature_lags autoThe features to lag will be automatically decided by automated ML when target_lags are set and feature_lags is set to auto. 啟用功能延遲可能有助於改善精確度。Enabling feature lags may help to improve accuracy. 功能延遲預設為停用。Feature lags are disabled by default.
target_rolling_window_size 要用來產生預測值的 n 個歷程記錄週期,小於或等於定型集大小。n historical periods to use to generate forecasted values, <= training set size. 如果省略,則 n 就是完整的定型集大小。If omitted, n is the full training set size. 若在將模型定型時只想考慮特定數量的歷程記錄,則請指定此參數。Specify this parameter when you only want to consider a certain amount of history when training the model. 深入瞭解 目標滾動視窗匯總Learn more about target rolling window aggregation.
short_series_handling_config 啟用簡短的時間序列處理,以避免因為資料不足而在定型期間發生失敗。Enables short time series handling to avoid failing during training due to insufficient data. 簡短數列處理預設為設定為 autoShort series handling is set to auto by default. 深入瞭解 簡短的系列處理Learn more about short series handling.

下列程式碼:The following code,

  • 利用 ForecastingParameters 類別來定義實驗訓練的預測參數Leverages the ForecastingParameters class to define the forecasting parameters for your experiment training
  • 將設定 time_column_nameday_datetime 資料集中的欄位。Sets the time_column_name to the day_datetime field in the data set.
  • 定義的 time_series_id_column_names 參數 "store"Defines the time_series_id_column_names parameter to "store". 這可確保針對資料建立了 兩個不同的時間序列群組 ;一個用於 store A 和 B。This ensures that two separate time-series groups are created for the data; one for store A and B.
  • 將設定 forecast_horizon 為50,以便預測整個測試集。Sets the forecast_horizon to 50 in order to predict for the entire test set.
  • 將預測視窗設定為10個週期,並使用 target_rolling_window_sizeSets a forecast window to 10 periods with target_rolling_window_size
  • 以參數預先指定兩個句點的目標值單一延隔時間 target_lagsSpecifies a single lag on the target values for two periods ahead with the target_lags parameter.
  • 設定 target_lags 為建議的 [自動] 設定,這會為您自動偵測此值。Sets target_lags to the recommended "auto" setting, which will automatically detect this value for you.
from azureml.automl.core.forecasting_parameters import ForecastingParameters

forecasting_parameters = ForecastingParameters(time_column_name='day_datetime', 
                                               forecast_horizon=50,
                                               time_series_id_column_names=["store"],
                                               freq='W',
                                               target_lags='auto',
                                               target_rolling_window_size=10)
                                              

這些 forecasting_parameters 會接著傳遞至您 AutoMLConfig 的標準物件,以及工作 forecasting 類型、主要度量、結束準則和定型資料。These forecasting_parameters are then passed into your standard AutoMLConfig object along with the forecasting task type, primary metric, exit criteria and training data.

from azureml.core.workspace import Workspace
from azureml.core.experiment import Experiment
from azureml.train.automl import AutoMLConfig
import logging

automl_config = AutoMLConfig(task='forecasting',
                             primary_metric='normalized_root_mean_squared_error',
                             experiment_timeout_minutes=15,
                             enable_early_stopping=True,
                             training_data=train_data,
                             label_column_name=label,
                             n_cross_validations=5,
                             enable_ensembling=False,
                             verbosity=logging.INFO,
                             **forecasting_parameters)

特徵化步驟Featurization steps

在每個自動化機器學習實驗中,預設會將自動調整和正規化技術套用至您的資料。In every automated machine learning experiment, automatic scaling and normalization techniques are applied to your data by default. 這些技術是 特徵化 的類型,可協助 特定 的演算法,而這些演算法對不同規模的功能很敏感。These techniques are types of featurization that help certain algorithms that are sensitive to features on different scales. 深入瞭解AutoML 中特徵化的預設特徵化步驟Learn more about default featurization steps in Featurization in AutoML

但是,下列步驟只會針對工作 forecasting 類型執行:However, the following steps are performed only for forecasting task types:

  • 偵測時間序列取樣頻率 (例如,每小時、每天、每週),並為不存在的時間點建立新記錄,使序列連續不斷。Detect time-series sample frequency (for example, hourly, daily, weekly) and create new records for absent time points to make the series continuous.
  • 插補目標中的遺漏值 (透過向前填滿),以及插補特徵資料行中的遺漏值 (使用中位數的資料行值)Impute missing values in the target (via forward-fill) and feature columns (using median column values)
  • 建立以時間序列識別碼為基礎的功能,以啟用跨不同系列的固定效果Create features based on time series identifiers to enable fixed effects across different series
  • 建立時間型特徵,以協助學習季節性模式Create time-based features to assist in learning seasonal patterns
  • 將類別變數編碼為數值數量Encode categorical variables to numeric quantities

若要取得這些步驟的結果所建立功能的摘要,請參閱 特徵化透明度To get a summary of what features are created as result of these steps, see Featurization transparency

注意

自動化機器學習特徵化步驟 (功能標準化、處理遺漏的資料、將文字轉換為數值等等) 會成為基礎模型的一部分。Automated machine learning featurization steps (feature normalization, handling missing data, converting text to numeric, etc.) become part of the underlying model. 使用模型進行預測時,定型期間所套用的相同特徵化步驟會自動套用至您的輸入資料。When using the model for predictions, the same featurization steps applied during training are applied to your input data automatically.

自訂特徵化Customize featurization

您也可以選擇自訂您的特徵化設定,以確保用來定型 ML 模型的資料和功能會產生相關的預測。You also have the option to customize your featurization settings to ensure that the data and features that are used to train your ML model result in relevant predictions.

支援的工作自訂 forecasting 包括:Supported customizations for forecasting tasks include:

自訂Customization 定義Definition
資料行用途更新Column purpose update 針對指定的資料行覆寫自動偵測到的功能類型。Override the auto-detected feature type for the specified column.
轉換器參數更新Transformer parameter update 更新指定轉換器的參數。Update the parameters for the specified transformer. 目前支援 Imputer (fill_value 和中位數) 。Currently supports Imputer (fill_value and median).
卸除資料行Drop columns 指定要從特徵化中捨棄的資料行。Specifies columns to drop from being featurized.

若要使用 SDK 自訂 featurizations,請 "featurization": FeaturizationConfig 在您的物件中指定 AutoMLConfigTo customize featurizations with the SDK, specify "featurization": FeaturizationConfig in your AutoMLConfig object. 深入瞭解 自訂 featurizationsLearn more about custom featurizations.

注意

從 SDK 1.19 版,卸載資料 功能已被取代。The drop columns functionality is deprecated as of SDK version 1.19. 在您的自動化 ML 實驗中使用資料行之前,請先在資料清理過程中將資料行卸載。Drop columns from your dataset as part of data cleansing, prior to consuming it in your automated ML experiment.

featurization_config = FeaturizationConfig()

# `logQuantity` is a leaky feature, so we remove it.
featurization_config.drop_columns = ['logQuantitity']

# Force the CPWVOL5 feature to be of numeric type.
featurization_config.add_column_purpose('CPWVOL5', 'Numeric')

# Fill missing values in the target column, Quantity, with zeroes.
featurization_config.add_transformer_params('Imputer', ['Quantity'], {"strategy": "constant", "fill_value": 0})

# Fill mising values in the `INCOME` column with median value.
featurization_config.add_transformer_params('Imputer', ['INCOME'], {"strategy": "median"})

如果您在實驗中使用 Azure Machine Learning studio,請參閱 如何在 studio 中自訂特徵化If you're using the Azure Machine Learning studio for your experiment, see how to customize featurization in the studio.

選擇性設定Optional configurations

其他選擇性設定適用于預測工作,例如啟用深度學習和指定目標滾動視窗匯總。Additional optional configurations are available for forecasting tasks, such as enabling deep learning and specifying a target rolling window aggregation.

啟用深度學習Enable deep learning

注意

在自動化 Machine Learning 中預測的 DNN 支援處於 預覽階段 ,不支援本機執行。DNN support for forecasting in Automated Machine Learning is in preview and not supported for local runs.

您也可以運用深度學習與深度類神經網路(Dnn)來改善模型的分數。You can also leverage deep learning with deep neural networks, DNNs, to improve the scores of your model. 自動化 ML 的深度學習可供預測單一變量和多變量時間序列資料。Automated ML's deep learning allows for forecasting univariate and multivariate time series data.

深度學習模型有三種內建功能:Deep learning models have three intrinsic capabilities:

  1. 可從輸入到輸出的任意對應中學習They can learn from arbitrary mappings from inputs to outputs
  2. 支援多個輸入和輸出They support multiple inputs and outputs
  3. 它們可以自動將輸入資料中的模式解壓縮到長序列。They can automatically extract patterns in input data that spans over long sequences.

若要啟用深度學習,請 enable_dnn=True 在物件中設定 AutoMLConfigTo enable deep learning, set the enable_dnn=True in the AutoMLConfig object.

automl_config = AutoMLConfig(task='forecasting',
                             enable_dnn=True,
                             ...
                             **forecasting_parameters)

警告

當您針對以 SDK 建立的實驗啟用 DNN 時,會停用 最佳模型說明When you enable DNN for experiments created with the SDK, best model explanations are disabled.

若要啟用在 Azure Machine Learning studio 中建立之 AutoML 實驗的 DNN,請參閱 studio 操作說明中的工作類型設定To enable DNN for an AutoML experiment created in the Azure Machine Learning studio, see the task type settings in the studio how-to.

如需運用 DNN 的詳細程式碼範例,請參閱飲料生產預測筆記本View the Beverage Production Forecasting notebook for a detailed code example leveraging DNNs.

目標移動時段彙總Target Rolling Window Aggregation

預測器可擁有的最佳資訊通常是目標其最新值。Often the best information a forecaster can have is the recent value of the target. 目標輪流視窗匯總可讓您將資料值的滾動匯總新增為特徵。Target rolling window aggregations allow you to add a rolling aggregation of data values as features. 產生和使用這些額外特徵作為額外的內容資料,可協助提高定型模型的精確度。Generating and using these additional features as extra contextual data helps with the accuracy of the train model.

例如,假設您想要預測能源需求。For example, say you want to predict energy demand. 您可能會想要新增三天的滾動視窗功能,以考慮熱空間的熱變更。You might want to add a rolling window feature of three days to account for thermal changes of heated spaces. 在此範例中,請在函式中設定來建立此視窗 target_rolling_window_size= 3 AutoMLConfigIn this example, create this window by setting target_rolling_window_size= 3 in the AutoMLConfig constructor.

資料表會顯示套用視窗匯總時所產生的特徵工程。The table shows resulting feature engineering that occurs when window aggregation is applied. 最小值、最大值總和 的資料行會根據所定義的設定,在三個滑動視窗上產生。Columns for minimum, maximum, and sum are generated on a sliding window of three based on the defined settings. 每個資料列都有新的計算功能,在2017年9月8日的時間戳記案例中,上午10:00 最大值、最小值和總和值的計算方式是使用 2017 1 9 月8日的 需求值 :上午 10:00-3:上午10:00。Each row has a new calculated feature, in the case of the timestamp for September 8, 2017 4:00am the maximum, minimum, and sum values are calculated using the demand values for September 8, 2017 1:00AM - 3:00AM. 這三個時段會移位以在剩餘的資料列中填入資料。This window of three shifts along to populate data for the remaining rows.

目標滾動視窗

請檢視運用目標移動時段彙總特徵的 Python 程式碼範例。View a Python code example leveraging the target rolling window aggregate feature.

短序列處理Short series handling

如果沒有足夠的資料點可進行模型開發的定型和驗證階段,自動化 ML 會將時間序列視為一 段很短 的時間。Automated ML considers a time series a short series if there are not enough data points to conduct the train and validation phases of model development. 每個實驗的資料點數目各不相同,而且取決於 max_horizon、交叉驗證分割的數目和模型回顧的長度,也就是建立時間序列功能所需的最大歷程記錄。The number of data points varies for each experiment, and depends on the max_horizon, the number of cross validation splits, and the length of the model lookback, that is the maximum of history that's needed to construct the time-series features. 如需確切的計算,請參閱 short_series_handling_configuration 參考檔For the exact calculation see the short_series_handling_configuration reference documentation.

自動化 ML 預設會使用物件中的參數來提供簡短的系列處理 short_series_handling_configuration ForecastingParametersAutomated ML offers short series handling by default with the short_series_handling_configuration parameter in the ForecastingParameters object.

若要啟用短序列處理, freq 也必須定義參數。To enable short series handling, the freq parameter must also be defined. 若要定義每小時頻率,我們將設定 freq='H'To define an hourly frequency, we will set freq='H'. 這裡查看頻率字串選項。View the frequency string options here. 若要變更預設行為, short_series_handling_configuration = 'auto' 請更新 short_series_handling_configuration 物件中的參數 ForecastingParameterTo change the default behavior, short_series_handling_configuration = 'auto', update the short_series_handling_configuration parameter in your ForecastingParameter object.

from azureml.automl.core.forecasting_parameters import ForecastingParameters

forecast_parameters = ForecastingParameters(time_column_name='day_datetime', 
                                            forecast_horizon=50,
                                            short_series_handling_configuration='auto',
                                            freq = 'H',
                                            target_lags='auto')

下表摘要說明的可用設定 short_series_handling_configThe following table summarizes the available settings for short_series_handling_config.

設定Setting 描述Description
auto 以下是簡短系列處理的預設行為The following is the default behavior for short series handling
  • 如果所有數列都是簡短 的,請填補資料。If all series are short, pad the data.
  • 如果並非所有數列都是簡短 的,請卸載簡短的數列。If not all series are short, drop the short series.
  • pad 如果 short_series_handling_config = pad 為,則自動化 ML 會將隨機值新增至每個找到的短序列。If short_series_handling_config = pad, then automated ML adds random values to each short series found. 下列列出資料行類型,以及它們的填補方式:The following lists the column types and what they are padded with:
  • 具有 Nan 的物件資料行Object columns with NaNs
  • 具有0的數值資料行Numeric columns with 0
  • 具有 False 的布林值/邏輯資料行Boolean/logic columns with False
  • 目標資料行是以零和標準差1的隨機值填補。The target column is padded with random values with mean of zero and standard deviation of 1.
  • drop 如果 short_series_handling_config = drop 為,則自動化 ML 會捨棄短數列,而不會用於定型或預測。If short_series_handling_config = drop, then automated ML drops the short series, and it will not be used for training or prediction. 這些數列的預測將會傳回 NaN。Predictions for these series will return NaN's.
    None 未填補或卸載任何數列No series is padded or dropped

    警告

    填補可能會影響產生之模型的精確度,因為我們只會介紹人工資料,而不會發生失敗。Padding may impact the accuracy of the resulting model, since we are introducing artificial data just to get past training without failures.

    如果有許多系列很短,您可能也會在可解釋性結果中看到一些影響If many of the series are short, then you may also see some impact in explainability results

    執行實驗Run the experiment

    當您的 AutoMLConfig 物件就緒時,您可以提交實驗。When you have your AutoMLConfig object ready, you can submit the experiment. 在模型完成之後,請擷取最佳的執行反覆項目。After the model finishes, retrieve the best run iteration.

    ws = Workspace.from_config()
    experiment = Experiment(ws, "Tutorial-automl-forecasting")
    local_run = experiment.submit(automl_config, show_output=True)
    best_run, fitted_model = local_run.get_output()
    

    使用最佳模型進行預測Forecasting with best model

    使用最佳模型反覆項目來預測測試資料集的值。Use the best model iteration to forecast values for the test data set.

    forecast()函數可讓您在預測開始時使用的規格,與 predict() 通常用於分類和回歸工作的不同。The forecast() function allows specifications of when predictions should start, unlike the predict(), which is typically used for classification and regression tasks.

    在下列範例中,您會先將 y_pred 中的所有值取代為 NaNIn the following example, you first replace all values in y_pred with NaN. 在此情況下,預測來源會在定型資料的結尾。The forecast origin will be at the end of training data in this case. 不過,如果只以 NaN 取代 y_pred 的後半部分,則此函式不會修改前半部分的數值,但會預測後半部分的 NaN 值。However, if you replaced only the second half of y_pred with NaN, the function would leave the numerical values in the first half unmodified, but forecast the NaN values in the second half. 此函式會傳回預測的值和調整後的特徵。The function returns both the forecasted values and the aligned features.

    您也可以使用 forecast() 函式中的 forecast_destination 參數,以預測直到指定日期之前的值。You can also use the forecast_destination parameter in the forecast() function to forecast values up until a specified date.

    label_query = test_labels.copy().astype(np.float)
    label_query.fill(np.nan)
    label_fcst, data_trans = fitted_pipeline.forecast(
        test_data, label_query, forecast_destination=pd.Timestamp(2019, 1, 8))
    

    計算 actual_labels 實際值與中預測值之間的根本 mean 平方誤差 (RMSE) predict_labelsCalculate root mean squared error (RMSE) between the actual_labels actual values, and the forecasted values in predict_labels.

    from sklearn.metrics import mean_squared_error
    from math import sqrt
    
    rmse = sqrt(mean_squared_error(actual_labels, predict_labels))
    rmse
    

    現在已決定整體模型精確度,接著下一個最實際步驟是使用模型來預測未知的未來值。Now that the overall model accuracy has been determined, the most realistic next step is to use the model to forecast unknown future values.

    如果使用與測試集 test_data 相同的格式來提供資料集,但具有未來的日期時間,則所產生預測集是每個時間序列步驟的預測值。Supply a data set in the same format as the test set test_data but with future datetimes, and the resulting prediction set is the forecasted values for each time-series step. 假設資料集內的最後一個時間序列記錄是 2018 年 12 月 31 日。Assume the last time-series records in the data set were for 12/31/2018. 若要預測隔天的需求 (或需要預測的週期數,小於或等於 forecast_horizon),請為每間商店建立 2019 年 1 月 1 日的單一時間序列記錄。To forecast demand for the next day (or as many periods as you need to forecast, <= forecast_horizon), create a single time series record for each store for 01/01/2019.

    day_datetime,store,week_of_year
    01/01/2019,A,1
    01/01/2019,A,1
    

    重複必要的步驟,將此未來資料載入至資料框架,然後執行 best_run.predict(test_data) 來預測未來值。Repeat the necessary steps to load this future data to a dataframe and then run best_run.predict(test_data) to predict future values.

    注意

    週期數若大於 forecast_horizon,則無法預測值。Values cannot be predicted for number of periods greater than the forecast_horizon. 若要預測目前範圍以外的未來值,模型必須以較大的範圍來重新定型。The model must be re-trained with a larger horizon to predict future values beyond the current horizon.

    Notebook 範例Example notebooks

    如需進階預測設定的詳細程式碼範例,請參閱預測範例筆記本,包括:See the forecasting sample notebooks for detailed code examples of advanced forecasting configuration including:

    後續步驟Next steps