question

movingabout-2877 avatar image
0 Votes"
movingabout-2877 asked ramr-msft edited

AutoML: problem with univariate time series forecasting

I'm having troubles generating univariate time series forecasts with Azure Automated Machine Learning (I know...).

What I'm doing

So I have about 5 years worth of monthly observations in a dataframe that looks like this:

date target_value
2015-02-01 123
2015-03-01 456
2015-04-01 789
... ...

I want to forecast target_value based on past values of target_value, i.e. univariate forecasting like ARIMA for instance.
So I am setting up the AutoML forecast like this:


 # that's the dataframe as shown above
 train_data = Dataset.Tabular.from_delimited_files(path=datastore.path(my_remote_filename))
    
 # ...other code...
    
 forecasting_parameters = ForecastingParameters(
     time_column_name='date',
     forecast_horizon=2,
     target_lags='auto',
     freq='MS'
 )
    
 automl_config = AutoMLConfig(task='forecasting',
                              debug_log='automl_forecasting_function.log',
                              primary_metric='normalized_root_mean_squared_error',
                              enable_dnn=True,
                              experiment_timeout_hours=8.0,
                              enable_early_stopping=True,
                              training_data=train_data,
                              compute_target='my-cluster',
                              n_cross_validations=3,
                              verbosity=logging.INFO,
                              max_concurrent_iterations=4,
                              max_cores_per_iteration=-1,
                              label_column_name='target_value',
                              forecasting_parameters=forecasting_parameters)

What the problem is

But AutoML does not seem to generate the forecast for target_value based on past values of target_value. It seems to use the date column as the independent variable!
The feature importance chart also shows date as the input feature:

84928-5ajgr.png

As a side note: running multivariate forecasts works fine.
When I use a dataset like this, feature_1 and feature_2 are used (i.e. as the X) to forecast target_value (i.e. the y)

date feature_1 feature_2 target_value
2015-02-01 10 7 123
2015-03-01 30 2 456
2015-04-01 20 5 789
... ... ... ...

My questions therefore
How do I need to set up a univariate AutoML forecast to forecast target_value based on past observations target_value?
I assumed generating lagged values for target_value etc. is exactly what AutoML is supposed to do.

Thanks!


azure-machine-learning
5ajgr.png (22.9 KiB)
· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

@movingabout-2877 Thanks for the question. To help users get sound predictions, we have released rolling prediction and evaluation which allows users to forecast out shorter periods, automatically appending these predictions to the training data, and forecasting until the desired forecast is reached.

Please follow the sample for Forecasting away from training data and doc to Auto-train a time-series forecast model.


0 Votes 0 ·

1 Answer

ramr-msft avatar image
0 Votes"
ramr-msft answered ramr-msft edited

@movingabout-2877 Thanks, AutoML does use the date column as an independent variable. We engineer several features from it, this is a standard practice for learning seasonal patterns. In the given scenario the date column will be featurized to represent 'day', 'month', 'day of week' etc. This is done to train regression-based model on this data, which will use the generated columns for prediction.

Please remove the target_lags='auto' to allow selection of Arima. We have to block certain models (e.g. Arima) when the target lags are set. This is a product gap that we're in the process of fixing.

· 3
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Thanks for the answer!

However, after removing the target_lags='auto' still only the date column is used as an independent variable. Past values of the target_value column are not used as an independent variable.

How do I need to set up the AutoMLConfig and ForecastingParameters to use past values of target_value to forecast target_value in the case of only two columns date and target_value?











0 Votes 0 ·

Concerning my followup question below (i.e. "autoregressively" using past values of target_value as independent variable).
Could you provide help on this?

If AutoML can't do this automatically, I'd resort to adding lagged values of target_value manually as an independent variable.

0 Votes 0 ·

@movingabout-2877 Thanks for the details, We would recommend to raise a Azure support desk ticket from Help+Support blade from Azure portal for your resource. This will help you to share the details securely and work with an engineer who can provide more insights about the issue that if it can be replicated.

0 Votes 0 ·