Set up AutoML to train a time-series forecasting model with Python

In this article, you learn how to set up AutoML training for time-series forecasting models with Azure Machine Learning automated ML in the Azure Machine Learning Python SDK.

To do so, you:

  • Prepare data for time series modeling.
  • Configure specific time-series parameters in an AutoMLConfig object.
  • Run predictions with time-series data.

For a low code experience, see the Tutorial: Forecast demand with automated machine learning for a time-series forecasting example using automated ML in the Azure Machine Learning studio.

Unlike classical time series methods, in automated ML, past time-series values are "pivoted" to become additional dimensions for the regressor together with other predictors. This approach incorporates multiple contextual variables and their relationship to one another during training. Since multiple factors can influence a forecast, this method aligns itself well with real world forecasting scenarios. For example, when forecasting sales, interactions of historical trends, exchange rate, and price all jointly drive the sales outcome.


For this article you need,

Preparing data

The most important difference between a forecasting regression task type and regression task type within automated ML is including a feature in your data that represents a valid time series. A regular time series has a well-defined and consistent frequency and has a value at every sample point in a continuous time span.

Consider the following snapshot of a file sample.csv. This data set is of daily sales data for a company that has two different stores, A, and B.

Additionally, there are features for

  • week_of_year: allows the model to detect weekly seasonality.
  • day_datetime: represents a clean time series with daily frequency.
  • sales_quantity: the target column for running predictions.

Read the data into a Pandas dataframe, then use the to_datetime function to ensure the time series is a datetime type.

import pandas as pd
data = pd.read_csv("sample.csv")
data["day_datetime"] = pd.to_datetime(data["day_datetime"])

In this case, the data is already sorted ascending by the time field day_datetime. However, when setting up an experiment, ensure the desired time column is sorted in ascending order to build a valid time series.

The following code,

  • Assumes the data contains 1,000 records, and makes a deterministic split in the data to create training and test data sets.
  • Identifies the label column as sales_quantity.
  • Separates the label field from test_data to form the test_target set.
train_data = data.iloc[:950]
test_data = data.iloc[-50:]

label =  "sales_quantity"
test_labels = test_data.pop(label).values


When training a model for forecasting future values, ensure all the features used in training can be used when running predictions for your intended horizon.

For example, when creating a demand forecast, including a feature for current stock price could massively increase training accuracy. However, if you intend to forecast with a long horizon, you may not be able to accurately predict future stock values corresponding to future time-series points, and model accuracy could suffer.

Training and validation data

You can specify separate train and validation sets directly in the AutoMLConfig object. Learn more about the AutoMLConfig.

For time series forecasting, only Rolling Origin Cross Validation (ROCV) is used for validation by default. Pass the training and validation data together, and set the number of cross validation folds with the n_cross_validations parameter in your AutoMLConfig. ROCV divides the series into training and validation data using an origin time point. Sliding the origin in time generates the cross-validation folds. This strategy preserves the time series data integrity and eliminates the risk of data leakage

rolling origin cross validation

You can also bring your own validation data, learn more in Configure data splits and cross-validation in AutoML.

automl_config = AutoMLConfig(task='forecasting',

Learn more about how AutoML applies cross validation to prevent over-fitting models.

Configure experiment

The AutoMLConfig object defines the settings and data necessary for an automated machine learning task. Configuration for a forecasting model is similar to the setup of a standard regression model, but certain models, configuration options, and featurization steps exist specifically for time-series data.

Supported models

Automated machine learning automatically tries different models and algorithms as part of the model creation and tuning process. As a user, there is no need for you to specify the algorithm. For forecasting experiments, both native time-series and deep learning models are part of the recommendation system. The following table summarizes this subset of models.


Traditional regression models are also tested as part of the recommendation system for forecasting experiments. See the supported model table for the full list of models.

Models Description Benefits
Prophet (Preview) Prophet works best with time series that have strong seasonal effects and several seasons of historical data. To leverage this model, install it locally using pip install fbprophet. Accurate & fast, robust to outliers, missing data, and dramatic changes in your time series.
Auto-ARIMA (Preview) Auto-Regressive Integrated Moving Average (ARIMA) performs best, when the data is stationary. This means that its statistical properties like the mean and variance are constant over the entire set. For example, if you flip a coin, then the probability of you getting heads is 50%, regardless if you flip today, tomorrow, or next year. Great for univariate series, since the past values are used to predict the future values.
ForecastTCN (Preview) ForecastTCN is a neural network model designed to tackle the most demanding forecasting tasks. It captures nonlinear local and global trends in your data and relationships between time series. Capable of leveraging complex trends in your data and readily scales to the largest of datasets.

Configuration settings

Similar to a regression problem, you define standard training parameters like task type, number of iterations, training data, and number of cross-validations. For forecasting tasks, there are additional parameters that must be set that affect the experiment.

The following table summarizes these additional parameters. See the ForecastingParameter class reference documentation for syntax design patterns.

Parameter name Description Required
time_column_name Used to specify the datetime column in the input data used for building the time series and inferring its frequency.
forecast_horizon Defines how many periods forward you would like to forecast. The horizon is in units of the time series frequency. Units are based on the time interval of your training data, for example, monthly, weekly that the forecaster should predict out.
enable_dnn Enable Forecasting DNNs.
time_series_id_column_names The column name(s) used to uniquely identify the time series in data that has multiple rows with the same timestamp. If time series identifiers are not defined, the data set is assumed to be one time-series. To learn more about single time-series, see the energy_demand_notebook.
freq The time series dataset frequency. This parameter represents the period with which events are expected to occur, such as daily, weekly, yearly, etc. The frequency must be a pandas offset alias. Learn more about [frequency].(#frequency-target-data-aggregation)
target_lags Number of rows to lag the target values based on the frequency of the data. The lag is represented as a list or single integer. Lag should be used when the relationship between the independent variables and dependent variable doesn't match up or correlate by default.
feature_lags The features to lag will be automatically decided by automated ML when target_lags are set and feature_lags is set to auto. Enabling feature lags may help to improve accuracy. Feature lags are disabled by default.
target_rolling_window_size n historical periods to use to generate forecasted values, <= training set size. If omitted, n is the full training set size. Specify this parameter when you only want to consider a certain amount of history when training the model. Learn more about target rolling window aggregation.
short_series_handling_config Enables short time series handling to avoid failing during training due to insufficient data. Short series handling is set to auto by default. Learn more about short series handling.
target_aggregation_function The function to be used to aggregate the time series target column to conform to the frequency specified via the freq parameter. The freq parameter must be set, in order to use the target_aggregation_function. Defaults to None; for most scenarios using sum is sufficient.
Learn more about target column aggregation.

The following code,

  • Leverages the ForecastingParameters class to define the forecasting parameters for your experiment training
  • Sets the time_column_name to the day_datetime field in the data set.
  • Defines the time_series_id_column_names parameter to "store". This ensures that two separate time-series groups are created for the data; one for store A and B.
  • Sets the forecast_horizon to 50 in order to predict for the entire test set.
  • Sets a forecast window to 10 periods with target_rolling_window_size
  • Specifies a single lag on the target values for two periods ahead with the target_lags parameter.
  • Sets target_lags to the recommended "auto" setting, which will automatically detect this value for you.
from azureml.automl.core.forecasting_parameters import ForecastingParameters

forecasting_parameters = ForecastingParameters(time_column_name='day_datetime', 

These forecasting_parameters are then passed into your standard AutoMLConfig object along with the forecasting task type, primary metric, exit criteria, and training data.

from azureml.core.workspace import Workspace
from azureml.core.experiment import Experiment
from azureml.train.automl import AutoMLConfig
import logging

automl_config = AutoMLConfig(task='forecasting',

The amount of data required to successfully train a forecasting model with automated ML is influenced by the forecast_horizon, n_cross_validations, and target_lags or target_rolling_window_size values specified when you configure your AutoMLConfig.

The following formula calculates the amount of historic data that what would be needed to construct time series features.

Minimum historic data required: (2x forecast_horizon) + #n_cross_validations + max(max(target_lags), target_rolling_window_size)

An Error exception will be raised for any series in the dataset that does not meet the required amount of historic data for the relevant settings specified.

Featurization steps

In every automated machine learning experiment, automatic scaling and normalization techniques are applied to your data by default. These techniques are types of featurization that help certain algorithms that are sensitive to features on different scales. Learn more about default featurization steps in Featurization in AutoML

However, the following steps are performed only for forecasting task types:

  • Detect time-series sample frequency (for example, hourly, daily, weekly) and create new records for absent time points to make the series continuous.
  • Impute missing values in the target (via forward-fill) and feature columns (using median column values)
  • Create features based on time series identifiers to enable fixed effects across different series
  • Create time-based features to assist in learning seasonal patterns
  • Encode categorical variables to numeric quantities

To get a summary of what features are created as result of these steps, see Featurization transparency


Automated machine learning featurization steps (feature normalization, handling missing data, converting text to numeric, etc.) become part of the underlying model. When using the model for predictions, the same featurization steps applied during training are applied to your input data automatically.

Customize featurization

You also have the option to customize your featurization settings to ensure that the data and features that are used to train your ML model result in relevant predictions.

Supported customizations for forecasting tasks include:

Customization Definition
Column purpose update Override the auto-detected feature type for the specified column.
Transformer parameter update Update the parameters for the specified transformer. Currently supports Imputer (fill_value and median).
Drop columns Specifies columns to drop from being featurized.

To customize featurizations with the SDK, specify "featurization": FeaturizationConfig in your AutoMLConfig object. Learn more about custom featurizations.


The drop columns functionality is deprecated as of SDK version 1.19. Drop columns from your dataset as part of data cleansing, prior to consuming it in your automated ML experiment.

featurization_config = FeaturizationConfig()

# `logQuantity` is a leaky feature, so we remove it.
featurization_config.drop_columns = ['logQuantitity']

# Force the CPWVOL5 feature to be of numeric type.
featurization_config.add_column_purpose('CPWVOL5', 'Numeric')

# Fill missing values in the target column, Quantity, with zeroes.
featurization_config.add_transformer_params('Imputer', ['Quantity'], {"strategy": "constant", "fill_value": 0})

# Fill mising values in the `INCOME` column with median value.
featurization_config.add_transformer_params('Imputer', ['INCOME'], {"strategy": "median"})

If you're using the Azure Machine Learning studio for your experiment, see how to customize featurization in the studio.

Optional configurations

Additional optional configurations are available for forecasting tasks, such as enabling deep learning and specifying a target rolling window aggregation.

Frequency & target data aggregation

Leverage the frequency, freq, parameter to help avoid failures caused by irregular data, that is data that doesn't follow a set cadence, like hourly or daily data.

For highly irregular data or for varying business needs, users can optionally set their desired forecast frequency, freq, and specify the target_aggregation_function to aggregate the target column of the time series. Leverage these two settings in your AutoMLConfig object can help save some time on data preparation.

When the target_aggregation_function parameter is used,

  • The target column values are aggregated based on the specified operation. Typically, sum is appropriate for most scenarios.

  • Numerical predictor columns in your data are aggregated by sum, mean, minimum value, and maximum value. As a result, automated ML generates new columns suffixed with the aggregation function name and applies the selected aggregate operation.

  • For categorical predictor columns, the data is aggregated by mode, the most prominent category in the window.

  • Date predictor columns are aggregated by minimum value, maximum value and mode.

Supported aggregation operations for target column values include:

Function description
sum  Sum of target values
mean  Mean or average of target values
min Minimum value of a target 
max Maximum value of a target 

Enable deep learning


DNN support for forecasting in Automated Machine Learning is in preview and not supported for local runs or runs initiated in Databricks.

You can also apply deep learning with deep neural networks, DNNs, to improve the scores of your model. Automated ML's deep learning allows for forecasting univariate and multivariate time series data.

Deep learning models have three intrinsic capabilities:

  1. They can learn from arbitrary mappings from inputs to outputs
  2. They support multiple inputs and outputs
  3. They can automatically extract patterns in input data that spans over long sequences.

To enable deep learning, set the enable_dnn=True in the AutoMLConfig object.

automl_config = AutoMLConfig(task='forecasting',


When you enable DNN for experiments created with the SDK, best model explanations are disabled.

To enable DNN for an AutoML experiment created in the Azure Machine Learning studio, see the task type settings in the studio how-to.

View the Beverage Production Forecasting notebook for a detailed code example using DNNs.

Target rolling window aggregation

Often the best information a forecaster can have is the recent value of the target. Target rolling window aggregations allow you to add a rolling aggregation of data values as features. Generating and using these features as extra contextual data helps with the accuracy of the train model.

For example, say you want to predict energy demand. You might want to add a rolling window feature of three days to account for thermal changes of heated spaces. In this example, create this window by setting target_rolling_window_size= 3 in the AutoMLConfig constructor.

The table shows resulting feature engineering that occurs when window aggregation is applied. Columns for minimum, maximum, and sum are generated on a sliding window of three based on the defined settings. Each row has a new calculated feature, in the case of the timestamp for September 8, 2017 4:00am the maximum, minimum, and sum values are calculated using the demand values for September 8, 2017 1:00AM - 3:00AM. This window of three shifts along to populate data for the remaining rows.

target rolling window

View a Python code example applying the target rolling window aggregate feature.

Short series handling

Automated ML considers a time series a short series if there are not enough data points to conduct the train and validation phases of model development. The number of data points varies for each experiment, and depends on the max_horizon, the number of cross validation splits, and the length of the model lookback, that is the maximum of history that's needed to construct the time-series features. For the exact calculation, see the short_series_handling_configuration reference documentation.

Automated ML offers short series handling by default with the short_series_handling_configuration parameter in the ForecastingParameters object.

To enable short series handling, the freq parameter must also be defined. To define an hourly frequency, we will set freq='H'. View the frequency string options here. To change the default behavior, short_series_handling_configuration = 'auto', update the short_series_handling_configuration parameter in your ForecastingParameter object.

from azureml.automl.core.forecasting_parameters import ForecastingParameters

forecast_parameters = ForecastingParameters(time_column_name='day_datetime', 
                                            freq = 'H',

The following table summarizes the available settings for short_series_handling_config.

Setting Description
auto The following is the default behavior for short series handling
  • If all series are short, pad the data.
  • If not all series are short, drop the short series.
  • pad If short_series_handling_config = pad, then automated ML adds random values to each short series found. The following lists the column types and what they are padded with:
  • Object columns with NaNs
  • Numeric columns with 0
  • Boolean/logic columns with False
  • The target column is padded with random values with mean of zero and standard deviation of 1.
  • drop If short_series_handling_config = drop, then automated ML drops the short series, and it will not be used for training or prediction. Predictions for these series will return NaN's.
    None No series is padded or dropped


    Padding may impact the accuracy of the resulting model, since we are introducing artificial data just to get past training without failures.

    If many of the series are short, then you may also see some impact in explainability results

    Run the experiment

    When you have your AutoMLConfig object ready, you can submit the experiment. After the model finishes, retrieve the best run iteration.

    ws = Workspace.from_config()
    experiment = Experiment(ws, "Tutorial-automl-forecasting")
    local_run = experiment.submit(automl_config, show_output=True)
    best_run, fitted_model = local_run.get_output()

    Forecasting with best model

    Use the best model iteration to forecast values for the test data set.

    The forecast_quantiles() function allows specifications of when predictions should start, unlike the predict() method, which is typically used for classification and regression tasks. The forecast_quantiles() method by default generates a point forecast or a mean/median forecast which doesn't have a cone of uncertainty around it.

    In the following example, you first replace all values in y_pred with NaN. The forecast origin is at the end of training data in this case. However, if you replaced only the second half of y_pred with NaN, the function would leave the numerical values in the first half unmodified, but forecast the NaN values in the second half. The function returns both the forecasted values and the aligned features.

    You can also use the forecast_destination parameter in the forecast_quantiles() function to forecast values up to a specified date.

    label_query = test_labels.copy().astype(np.float)
    label_fcst, data_trans = fitted_model.forecast_quantiles(
        test_data, label_query, forecast_destination=pd.Timestamp(2019, 1, 8))

    Often customers want to understand the predictions at a specific quantile of the distribution. For example, when the forecast is used to control inventory like grocery items or virtual machines for a cloud service. In such cases, the control point is usually something like "we want the item to be in stock and not run out 99% of the time". The following demonstrates how to specify which quantiles you'd like to see for your predictions, such as 50th or 95th percentile. If you don't specify a quantile, like in the aforementioned code example, then only the 50th percentile predictions are generated.

    # specify which quantiles you would like 
    fitted_model.quantiles = [0.05,0.5, 0.9]
        test_data, label_query, forecast_destination=pd.Timestamp(2019, 1, 8))

    Calculate root mean squared error (RMSE) between the actual_labels actual values, and the forecasted values in predict_labels.

    from sklearn.metrics import mean_squared_error
    from math import sqrt
    rmse = sqrt(mean_squared_error(actual_labels, predict_labels))

    Now that the overall model accuracy has been determined, the most realistic next step is to use the model to forecast unknown future values.

    Supply a data set in the same format as the test set test_data but with future datetimes, and the resulting prediction set is the forecasted values for each time-series step. Assume the last time-series records in the data set were for 12/31/2018. To forecast demand for the next day (or as many periods as you need to forecast, <= forecast_horizon), create a single time series record for each store for 01/01/2019.


    Repeat the necessary steps to load this future data to a dataframe and then run best_run.forecast_quantiles(test_data) to predict future values.


    In-sample predictions are not supported for forecasting with automated ML when target_lags and/or target_rolling_window_size are enabled.

    Forecasting at scale

    There are scenarios where a single machine learning model is insufficient and multiple machine learning models are needed. For instance, predicting sales for each individual store for a brand, or tailoring an experience to individual users. Building a model for each instance can lead to improved results on many machine learning problems.

    Grouping is a concept in time series forecasting that allows time series to be combined to train an individual model per group. This approach can be particularly helpful if you have time series which require smoothing, filling or entities in the group that can benefit from history or trends from other entities. Many models and hierarchical time series forecasting are solutions powered by automated machine learning for these large scale forecasting scenarios.

    Many models

    The Azure Machine Learning many models solution with automated machine learning allows users to train and manage millions of models in parallel. Many models The solution accelerator leverages Azure Machine Learning pipelines to train the model. Specifically, a Pipeline object and ParalleRunStep are used and require specific configuration parameters set through the ParallelRunConfig.

    The following diagram shows the workflow for the many models solution.

    Many models concept diagram

    The following code demonstrates the key parameters users need to set up their many models run.

    from azureml.train.automl.runtime._many_models.many_models_parameters import ManyModelsTrainParameters
    partition_column_names = ['Store', 'Brand']
    automl_settings = {"task" : 'forecasting',
                       "primary_metric" : 'normalized_root_mean_squared_error',
                       "iteration_timeout_minutes" : 10, #This needs to be changed based on the dataset. Explore how long training is taking before setting this value 
                       "iterations" : 15,
                       "experiment_timeout_hours" : 1,
                       "label_column_name" : 'Quantity',
                       "n_cross_validations" : 3,
                       "time_column_name": 'WeekStarting',
                       "max_horizon" : 6,
                       "track_child_runs": False,
                       "pipeline_fetch_max_batch_size": 15,}
    mm_paramters = ManyModelsTrainParameters(automl_settings=automl_settings, partition_column_names=partition_column_names)

    Hierarchical time series forecasting

    In most applications, customers have a need to understand their forecasts at a macro and micro level of the business; whether that be predicting sales of products at different geographic locations, or understanding the expected workforce demand for different organizations at a company. The ability to train a machine learning model to intelligently forecast on hierarchy data is essential.

    A hierarchical time series is a structure in which each of the unique series are arranged into a hierarchy based on dimensions such as, geography or product type. The following example shows data with unique attributes that form a hierarchy. Our hierarchy is defined by: the product type such as headphones or tablets, the product category which splits product types into accessories and devices, and the region the products are sold in.

    Example raw data table for hierarchical data

    To further visualize this, the leaf levels of the hierarchy contain all the time series with unique combinations of attribute values. Each higher level in the hierarchy considers one less dimension for defining the time series and aggregates each set of child nodes from the lower level into a parent node.

    Hierarchy visual for data

    The hierarchical time series solution is built on top of the Many Models Solution and share a similar configuration setup.

    The following code demonstrates the key parameters to set up your hierarchical time series forecasting runs.

    from azureml.train.automl.runtime._hts.hts_parameters import HTSTrainParameters
    model_explainability = True
    engineered_explanations = False # Define your hierarchy. Adjust the settings below based on your dataset.
    hierarchy = ["state", "store_id", "product_category", "SKU"]
    training_level = "SKU"# Set your forecast parameters. Adjust the settings below based on your dataset.
    time_column_name = "date"
    label_column_name = "quantity"
    forecast_horizon = 7
    automl_settings = {"task" : "forecasting",
                       "primary_metric" : "normalized_root_mean_squared_error",
                       "label_column_name": label_column_name,
                       "time_column_name": time_column_name,
                       "forecast_horizon": forecast_horizon,
                       "hierarchy_column_names": hierarchy,
                       "hierarchy_training_level": training_level,
                       "track_child_runs": False,
                       "pipeline_fetch_max_batch_size": 15,
                       "model_explainability": model_explainability,# The following settings are specific to this sample and should be adjusted according to your own needs.
                       "iteration_timeout_minutes" : 10,
                       "iterations" : 10,
                       "n_cross_validations": 2}
    hts_parameters = HTSTrainParameters(

    Example notebooks

    See the forecasting sample notebooks for detailed code examples of advanced forecasting configuration including:

    Next steps