Predicting Using the Averaged Forecasting Model (Intermediate Data Mining Tutorial)
In an earlier task in this lesson, you used the Prediction Query Builder to create a query that generates the default predictions for the Forecasting model. In this task, you will learn to create a prediction query that applies the worldwide sales model to one of the individual regions.
Using Replacement Data in a Time Series Prediction
In SQL Server 2008, the PredictTimeSeries (DMX) function has two new parameters, EXTEND_MODEL_CASES and REPLACE_MODEL_CASES, which enable you to customize time series predictions.
EXTEND_MODEL_CASES lets you add data to an existing model and make predictions based on the new data.
REPLACE_MODEL_CASES lets you replace the data series that was used to create the model with the data from another series.
In this scenario, you will use REPLACE_MODEL_CASES. However, you must first set up a data source view that contains only the replacement data. Replacement data means the data for each individual series. Then you will create a prediction query that applies the general model to the replacement data.
To create the data source view that contains the replacement data
In Solution Explorer, right-click Data Source Views, and select New Data Source View.
In the Data Source View wizard, make the following selections:
Data Source: Adventure Works DW2008
Select Tables and Views: Do not select any tables
Name: T1000 Pacific Region
Right-click the empty design surface for T1000 Pacific Region.dsv, and select New Named Query.
The Create Named Query dialog box opens. Retype the name, and add the following description:
Name: T1000 Pacific Region
Description: FiltervTimeSeriesby region and model
In the text pane, type the following query:
SELECT ReportingDate, ModelRegion, Quantity, Amount FROM dbo.vTimeSeries WHERE (ModelRegion = N'T1000 Pacific')
You can also build the query by using the Query Builder, but it is faster to type the text of the query. However, after you have typed the query text, you can click another pane to see how the query text is parsed and the query is built from the supporting tables. You might also want to copy the query text and save it to a text file so that you can re-use it for another data series.
In the Data Source View design surface, right-click T1000 Pacific, and select Explore Data to verify that the data is filtered correctly.
Now, you will build a time series prediction query as usual, but add the REPLACE_MODEL_CASES parameter to specify that the data series on which the prediction is based should be replaced with the new data that you provide. You must also specify the mapping between the mining model and the input table.
To build the prediction query and provide replacement data
If the model is not already open, double-click the AllRegions structure, and in Data Mining Designer, click the Mining Model Prediction tab.
In the Mining Model pane, the model AllRegions should already be selected, If it is not, click Select Model, and select the model, AllRegions.
In the Select Input Table(s) pane, click Select Case Table.
In the Select Table dialog box, change the data source to T1000 Pacific Region, and click OK.
The data source view that you created is displayed as a table with columns. Some columns might be already mapped to the mining model columns.
Right-click the join line between the mining model and the input data and select Modify Connections.
In the Modify Mapping dialog box, verify that the ReportingDate column in the mining model is mapped to the ReportingDate column in the input data.
In the Modify Mapping dialog box, in the row for AvgQty, click under Table Column and select T1000 Pacific.Quantity. Click OK.
This step maps the aggregated quantity column in the general model to the quantity column in the replacement data series.
In the grid, click the first empty row, under Source, and select AllRegions. For Field, select Region. For Alias, type Model Used.
This step adds a column to the results that helps you remember that the predictions are from the general model.
Click a new empty row, and under Source, select Custom Expression. For Alias, type ModelRegion. In the Criteria/Argument, type 'T1000 Pacific'.
This step adds a label to the results that helps you remember which series the predictions are for.
Click a new empty row, and under Source, select Prediction Function. For Field, select PredictTimeSeries. For Alias, type Predicted Values.
Drag the field AvgQty from the Mining Model pane and drop it into the Criteria/Argument column.
In the Criteria/Argument box, after the field name, type the following text: ,5, REPLACE_MODEL_CASES
The complete text of the Criteria/Argument text box should be as follows: [AllRegions].[AvgQty],5,REPLACE_MODEL_CASES
Viewing the Results
The prediction query returns results similar to the following:
7/25/2004 12:00:00 AM68
8/25/2004 12:00:00 AM52
9/25/2004 12:00:00 AM48
10/25/2004 12:00:00 AM56
11/25/2004 12:00:00 AM44
To apply the general model to a different data series, such as the T1000 product model in the North America region, you must create a different query for each series. However, rather than go through the entire process of building a data source view, you can edit the DMX statement that you created, and filter the inputs differently. For example, the following DMX statement represents the query that you just built:
SELECT ([All Regions].[Region]) as [Model Used], ( 'T-1000 Pacific') as [ModelRegion], (PredictTimeSeries([All Regions].[Avg Qty],5, REPLACE_MODEL_CASES)) as [Predicted Quantity] FROM [All Regions] PREDICTION JOIN OPENQUERY([Adventure Works DW2008], 'SELECT [ReportingDate] FROM (SELECT ReportingDate, ModelRegion, Quantity, Amount FROM dbo.vTimeSeries WHERE (ModelRegion = N''T1000 Pacific'')) as [T1000 Pacific] ') AS t ON [All Regions].[Reporting Date] = t.[ReportingDate] AND [All Regions].[Avg Qty] = t.[Quantity]
To apply this to a different model, you simply edit the query statement to replace the filter condition and the labels applied to each result. For example, if you change the filter conditions and column labels by replacing 'Pacific' with 'North America', you will get predictions for the T1000 product in North America, based on the patterns in the general model.
T-1000 North America
7/25/2004 12:00:00 AM103
8/25/2004 12:00:00 AM84
9/25/2004 12:00:00 AM79
10/25/2004 12:00:00 AM85
11/25/2004 12:00:00 AM68
Next Task in Lesson