Working with Data Mining

Data mining gives you access to the information that you need to make intelligent decisions about difficult business problems. Microsoft SQL Server 2005 Analysis Services (SSAS) provides tools for data mining with which you can identify rules and patterns in your data, so that you can determine why things happen and predict what will happen in the future. When you create a data mining solution in Analysis Services, you first create a model that describes your business problem, and then you run your data through an algorithm that generates a mathematical model of the data, a process that is known as training the model. You can then either visually explore the mining model or create prediction queries against it. Analysis Services can use datasets from both relational and OLAP databases, and includes a variety of algorithms that you can use to investigate that data. For a high-level overview of the data mining process, see Data Mining Concepts.

SQL Server 2005 provides different environments and tools that you can use for data mining. The following sections outline a typical process for creating a data mining solution, and identify the resources to use for each step.

Creating an Analysis Services Project

To create a data mining solution, you must first create a new Analysis Services project, and then add and configure a data source and a data source view for the project. The data source defines the connection string and authentication information with which to connect to the data source on which to base the mining model. The data source view provides an abstraction of the data source, which you can use to modify the structure of the data to make it more relevant to your project.

For More Information:Defining an Analysis Services Project, Defining a Data Source Using the Data Source Wizard, Defining a Data Source View Using the Data Source View Wizard

Adding Mining Structures to an Analysis Services Project

After you have created an Analysis Services project, you can add mining structures, and one or more mining models that are based on each structure. A mining structure, including tables and columns, is derived from an existing data source view or OLAP cube in the project. Adding a new mining structure starts the Data Mining Wizard, which you use to define the structure and to specify an algorithm and training data for use in creating an initial model based on that structure.

For More Information:Creating a New Mining Structure, Data Mining Algorithms, Data Mining Wizard

You can use the Mining Structure tab of Data Mining Designer to modify existing mining structures, including adding columns and nested tables.

For More Information:Data Mining Designer

Working with Data Mining Models

Before you can use the mining models you define, you must process them so that Analysis Services can pass the training data through the algorithms to fill the models. Analysis Services provides several options for processing mining model objects, including the ability to control which objects are processed and how they are processed.

For More Information:Processing in Analysis Services, Processing Data Mining Objects

After you have processed the models, you can investigate the results and make decisions about which models perform the best. Analysis Services provides viewers for each mining model type, within the Mining Model Viewer tab in Data Mining Designer, which you can use to explore the mining models. Analysis Services also provides tools, in the Mining Accuracy Chart tab of the designer, that you can use to directly compare mining models and to choose the mining model that works best for your purpose. These tools include a lift chart, a profit chart, and a classification matrix.

For More Information:Viewing a Data Mining Model, Validating Data Mining Models

Creating Predictions

The main goal of most data mining projects is to use a mining model to create predictions. After you explore and compare mining models, you can use one of several tools to create predictions. Analysis Services provides a query language called Data Mining Extensions (DMX) that is the basis for creating predictions. To help you build DMX prediction queries, SQL Server provides a query builder, available in SQL Server Management Studio and Business Intelligence Development Studio, and DMX templates for the query editor in Management Studio. Within BI Development Studio, you access the query builder from the Mining Model Prediction tab of Data Mining Designer.

For More Information:Creating DMX Prediction Queries, Data Mining Extensions (DMX) Statement Reference

SQL Server Management Studio

After you have used BI Development Studio to build mining models for your data mining project, you can manage and work with the models and create predictions in Management Studio.

For More Information:Data Mining in SQL Server Management Studio

SQL Server Reporting Services

After you create a mining model, you may want to distribute the results to a wider audience. You can use Report Designer in Microsoft SQL Server 2005 Reporting Services (SSRS) to create reports, which you can use to present the information that a mining model contains. You can use the result of any DMX query as the basis of a report, and can take advantage of the parameterization and formatting features that are available in Reporting Services.

For More Information:Working with Report Designer, Using Analysis Services DMX Query Designer

Working Programmatically with Data Mining

Analysis Services provides several tools that you can use to programmatically work with data mining. The Data Mining Extensions (DMX) language provides statements that you can use to create, train, and use data mining models. You can also perform these tasks by using a combination of XML for Analysis (XMLA) and Analysis Services Scripting Language (ASSL), or by using Analysis Management Objects (AMO).

You can access all the metadata that is associated with data mining by using data mining schema rowsets. For example, you can use schema rowsets to determine the data types that an algorithm supports, or the model names that exist in a database.

For More Information:Data Mining Extensions (DMX) Reference, Data Mining Schema Rowsets, Analysis Services Administration Programming (SSAS), Using XML for Analysis in Analysis Services (XMLA)

See Also


Developing Analysis Services Solutions and Projects
Data Mining Concepts
Mining Structures (Analysis Services)
SQL Server Analysis Services

Other Resources

Data Mining Extensions (DMX) Reference
Introducing Business Intelligence Development Studio
SQL Server Management Studio and Business Intelligence Development Studio

Help and Information

Getting SQL Server 2005 Assistance