New Time Series Features for SQL Server 2008
The time series algorithm has had some major changes for SQL Server 2008. We've ramped up the algorithm and added prediction against new data. While the new data prediction is just as cool as the algorithm changes, I'll have to save that for later - this post I'll just tell you about the algorithm.
As you may or may not know, the time series algorithm in SQL Server 2005 uses the ARTXP algorithm from Microsoft Research. ARTXP stands for "Auto Regression Trees with Cross Predict" and of course, since you have to have an 'X' in any acronym worth its salt, "cross" is represented by an 'X'. ARTXP uses similar regression trees as are found in our decision tree algorithm to predict future values based on past history, including the past history of other series. It turns out that ARTXP, as verified through testing by Microsoft Research and the product team, is wickedly accurate at predicting the next step of a series - better than all other algorithms we've tried. However, as you move away from the time horizon, depending on your data, the predictions can grow unstable fairly quickly. Veteran users of the algorithm know how that's handled by the system - it agressively cuts off the prediction stream - i.e. it just stops.
We heard some frustration from users that hit this, and knew we had to do something. We didn't want to lose the accuracy or cross predictive ability of the ARTXP algorithm, but needed to control stability. Additionally, we've had many requests for more "industry standard" algorithm implementations simply because they are well known (not because they are necessarily better). So what did we do?
In SQL Server 2008 Data Mining, we have enhanced the Time Series algorithm to include not only an implementation of ARTXP, but also an implementation of ARIMA (Auto Regressive Integrated Moving Average). The ARIMA algorithm is well-known and although all autoregressive algorithms have the potential for instability in long-term predictions, ARIMA tends to hold out for most data sets. So, does this mean that you as the user have to decide between short-term accuracy and long term stabililty? The answer is a resounding NO! The SQL Server 2008 Time Series algorithm by default builds both an ARTXP model and an ARIMA model. For predictions, a weighted averaging scheme is used to favor the short term accuracy of ARTXP and the long term stability of ARIMA - essentially the best of both worlds!
To control this new behavior we have add three new parameters:
FORECAST_METHOD - controls which algorithms are used with MIXED being the default, but you can specify ARIMA or ARTXP as well.
PREDICTION_SMOOTHING - controls the mixture of the ARTXP and ARIMA results. Setting this value closer to 0 favors ARTXP more heavily, whereas setting it closer to 1 favors ARIMA more.
INSTABILITY_SENSITIVITY - ok, this one has nothing to do with ARIMA, just your feedback on ARTXP. If you specify ARTXP as the forecast method, you can now control the threshold at which the algorithm will cut off predictions. Setting this to 0 will turn of instability checks altogether. (By the way, with ARTXP if your regression formulae have coefficients that are all less than 1, you will never have a stability problem. Stability issues only occur when there are coefficients greater than 1.
If you want to try out the new Time Series algorithm, it is available in the July 2007 CTP (and later) of SQL Server 2008 which you can find under www.microsoft.com/sql/2008. In a later post (I know better than to specify when) I'll talk about the new prediction methods that you can apply to Time Series models, making these model reusable in a way never available before in SQL Server.