Exploring the Targeted Mailing Models (Data Mining Tutorial)
After the models in your project are processed, you can view them by using the Mining Model Viewer tab in Data Mining Designer. You can use the Mining Model list at the top of the tab to examine the individual models in the mining structure.
The following sections describe how to explore mining models in the viewers.
- Microsoft Decision Tree Model
- Microsoft Clustering Model
- Microsoft Naive Bayes Model
Microsoft Decision Tree Model
When you switch to the Mining Model Viewer tab in Data Mining Designer for the Adventure Works DM tutorial project, the designer opens to the targeted mailing mining model, the first model in the structure. Each algorithm that you use to build a model in Analysis Services returns a different type of results. Therefore, Analysis Services provides a separate viewer for each algorithm. When you browse a mining model, the model is displayed on the Mining Model Viewer tab using the appropriate viewer for the model. In this case, for the decision tree model, the Microsoft Tree Viewer is used. This viewer contains two tabs, Decision Tree and Dependency Network.
On the Decision Tree tab, you can examine all the tree models that make up a mining model. Because the targeted mailing model in this tutorial project contains only a single predictable attribute, Bike Buyer, there is only one tree to view. If there were more trees, you could use the Tree box to choose another tree.
By default, the Microsoft Tree Viewer shows only the first three levels of the tree. If the tree contains fewer than three levels, the viewer shows only the existing levels. You can view more levels by using the Show Level slider or the Default Expansion list. For more information about how to configure the viewer, see Viewing a Mining Model with the Microsoft Tree Viewer.
To modify the tree
Slide Show Level to 5.
Change the Background list to 1.
By changing the Background setting, you can quickly see the number of cases for Bike Buyer that are equal to 1 that exist in each node. The darker the shading of the node, the more cases exist in the node.
Each node in the decision tree displays the following information:
- The condition that is required to reach that node from the node that comes before it. You can see the full node path in the Mining Legend or by pausing the pointer over a node to display an InfoTip.
- A histogram that describes the distribution of states for the predictable column in order of popularity. You can control how many states appear in the histogram by using the Histograms control.
- The concentration of cases, if the state of the predictable attribute is specified in the Background control.
You can see the training cases that each node supports by right-clicking the node and then selecting Drill Through.
The Dependency Network tab displays the relationships between the attributes that contribute to the predictive ability of the mining model.
The center node for the dependency network, Bike Buyer, represents the predictable attribute in the mining model. Each surrounding node represents an attribute that affects the outcome of the predictable attribute. You can use the slider on the left side of the tab to control the strength of the links that are shown. When you move the slider down, only the strongest links are shown.
Click an individual node in the network and then refer to the color legend at the bottom of the tab to see which nodes the selected node predicts or which nodes the selected node is predicted by.
Microsoft Clustering Model
Use the Mining Model list at the top of the Mining Model Viewer tab to switch to the TM_Clustering model. The viewer for this model, the Microsoft Cluster Viewer, contains four tabs: Cluster Diagram, Cluster Profiles, Cluster Characteristics, and Cluster Discrimination. By default, the viewer displays the Cluster Diagram tab when it first opens.
For more information about how to configure the Microsoft Cluster Viewer, see Viewing a Mining Model with the Microsoft Cluster Viewer.
With the Cluster Diagram tab, you can explore the relationships between the clusters that the algorithm discovers. The lines between the clusters represent "closeness" and are shaded based on how similar the clusters are. The actual color of each cluster represents the frequency of the variable and the state in the cluster. You can select the variable and the state in the Shading Variable and State boxes at the top of the node. The default variable is Population, but you can change this to any attribute in the model, to discover which clusters contain members that have the attributes you want. By using the slider to the left of the network, you can filter out the weaker links and find the clusters with the closest relationships.
For example, set Shading Variable to Bike Buyer and State to 1. You will see that Cluster 5 contains the highest density of bike buyers, and that the strongest relationship exists between Cluster 4 and Cluster 7.
The Cluster Profiles tab provides an overall view of the TM_Clustering model. As you can see in the viewer, the Cluster Profiles tab contains a column for each cluster in the model. The first column lists the attributes that are associated with at least one cluster. The rest of the viewer contains the distribution of the states of an attribute for each cluster. The distribution of a discrete variable is shown as a colored bar with the maximum number of bars displayed in the Histogram bars list. Continuous attributes are displayed with a diamond chart, which represents the mean and standard deviation in each cluster.
With the Cluster Characteristics tab, you can examine in more detail the characteristics that make up a cluster. For example, if you use the Cluster list to display Cluster 5 in this tutorial scenario, you can see that people in this cluster, who are customers who have purchased a bike in the past, tend to have the following characteristics: they commute only 0-1 miles, they do not own a car, and they are married.
With the Cluster Discrimination tab, you can explore the characteristics that distinguish one cluster from another. After you select two clusters from the Cluster 1 and Cluster 2 boxes, the viewer determines the differences between the clusters and displays them in order of the attributes that distinguish the clusters most.
For example, compare Cluster 5 and Cluster 7 from the TM_Clustering model. Cluster 5 contains the highest density of bike buyers, and Cluster 7 contains the lowest density of bike buyers. People in Cluster 7 tend to be from North America and younger, from age 23 to 31, while people in Cluster 5 tend to be from Europe and tend to have a short commute distance, between zero and one miles.
Microsoft Naive Bayes Model
Use the Mining Model list at the top of the Mining Model Viewer tab to switch to the TM_NaiveBayes model. The viewer for this model, the Microsoft Naive Bayes Viewer, contains four tabs: Dependency Network, Attribute Profiles, Attribute Characteristics, and Attribute Discrimination.
For more information about how to use the Microsoft Naive Bayes Viewer, see Viewing a Mining Model with the Microsoft Naive Bayes Viewer.
The Dependency Network tab works just like the Dependency Network tab for the Microsoft Tree Viewer. Each node in the viewer represents an attribute, and the lines between nodes represent relationships. In the viewer, you can see all the attributes that affect the state of the predictable attribute, Bike Buyer.
As you lower the slider, only the attributes that have the greatest effect on the Bike Buyer column remain. By adjusting the slider, you can discover that the number of cars owned is the greatest factor in determining whether someone is a bike buyer.
The Attribute Profiles tab describes how different states of the input attributes affect the outcome of the predictable attribute.
In the Predictable box, verify that Bike Buyer is selected. The attributes that affect the state of this predictable attribute are listed together with the values of each state of the input attributes and their distributions in each state of the predictable attribute.
With the Attribute Characteristics tab, you can select an attribute and value to see how frequently values for other attributes appear in the selected value cases.
In the Attribute list, verify that Bike Buyer is selected, and in the Value list, select 1. In the viewer, you will see that people who commute between zero and one miles to work and who live in the North American region tend to buy the most bicycles.
With the Attribute Discrimination tab, you can investigate the relationship between two discrete values of the selected predictable attribute and other attribute values. Because the TM_NaiveBayes model has only two states, 1 and 0, you do not have to make any changes to the viewer.
In the viewer, you can see that people who do not own cars tend to buy bicycles, and people who do own two cars tend not to buy bicycles.