Data Mining Queries
Data mining queries are useful for many purposes. You can:
Apply the model to new data, to make single or multiple predictions. You can provide input values as parameters, or in a batch.
Get a statistical summary of the data used for training.
Extract patterns and rules, or generate a profile of the typical case representing a pattern in the model.
Extract regression formulas and other calculations that explain patterns.
Get the cases that fit a particular pattern.
Retrieve details about individual cases used in the model, including data not used in analysis.
Retrain a model by adding new data, or perform cross-prediction.
This section provides an overview of the information you need to get started with data mining queries. It describes the types of queries you can create against data mining objects, introduces the query tools and query languages, and provides links to examples of queries that you can create against models that were built using the algorithms provided in SQL Server Data Mining.
Understanding Data Mining Queries
Analysis Services Data Mining supports the following types of queries:
Queries that make inferences based on patterns in the model, and from input data.
Queries that return metadata, statistics, and other information about the model itself.
Queries that can retrieve the underlying case data for the model, or even data from the structure that was not used in the model.
Queries that do not return information from the model, but rather are used to build models and structures or to update the data in a model or structure.
Before you create queries, we recommend that you familiarize yourself with the differences between models created with each of the data mining algorithms provided by SQL Server.
Browse and explore each model type by using the custom data mining viewers that are provided for each algorithm type. For more information, see Mining Model Viewer Tasks and How-tos.
Review the model content for each model type, by using the Microsoft Generic Content Tree Viewer. To interpret this information, refer to Mining Model Content (Analysis Services - Data Mining).
Query Tools and Interfaces
You can build data mining queries interactively by using one of the query tools provided by SQL Server. The graphical Prediction Query Builder is provided in both SQL Server Data Tools (SSDT) and SQL Server Management Studio. If you have not used the Prediction Query Builder before, we recommend that you follow the steps in the Basic Data Mining Tutorial to familiarize yourself with the interface. For q quick overview of the steps, see Create a Query using the Create a Prediction Query Using the Prediction Query Builder.
The Prediction Query Builder is helpful for starting queries that you will customize later. You can easily add data sources and map them to columns, and then switch to DMX view and customize the query by adding a WHERE clause or other functions.
Once you are familiar with data mining models and how to build queries, you can also write queries directly by using Data Mining Extensions (DMX). DMX is a query language that is similar to Transact-SQL, and that you can use from many different clients. DMX is the tool of choice for creating both custom predictions and complex queries. For an introduction to DMX, see Creating and Querying Data Mining Models with DMX: Tutorials (Analysis Services - Data Mining).
DMX editors are provided in both SQL Server Data Tools (SSDT) and SQL Server Management Studio. You can also use the Prediction Query Builder to start your queries, then change the view to the text editor and copy the DMX statement to another client. For more information, see Data Mining Query Tools.
You can compose DMX statements programmatically and send them from your client to the Analysis Services server by using AMO or XMLA. However, DMX is the language that you must use to create queries against a mining model.
You can also query the metadata, statistics, and some content of the model by using Dynamic Management Views (DMVs) that are based on the data mining schema rowsets. These DMVs make it easy to retrieve information about the model by typing SELECT statements; however, you cannot create predictions. For more information about DMVs supported by Analysis Services, see Use Dynamic Management Views (DMVs) to Monitor Analysis Services.
Finally, you can create data mining queries for use in Integration Services packages, by using the Data Mining Query Task, or the Data Mining Query Transformation. The control flow task supports multiple types of DMX queries, whereas the data flow transformation supports only queries that work with data in the data flow, meaning queries that use the PREDICTION JOIN syntax.
Queries for Different Model Types
The algorithm that was used when the model was created greatly influences the type of information that you can get from a data mining query. The reason for the differences is that each algorithm processes the data in a different way, and stores different kinds of patterns. For example, some algorithms create clusters; others create trees. Therefore, you might need to use specialized prediction and query functions, depending on the type of model that you are working with.
The following list provides a summary of the functions that you can use in queries:
General prediction functions: The Predict function is polymorphic, meaning it works with all model types. This function will automatically detect the type of model you are working with and prompt you for additional parameters. For more information, see Predict (DMX).
Not all models are used to make predictions. For example, you can create a clustering model that does not have a predictable attribute. However, even if a model does not have a predictable attribute, you can create prediction queries that return other types of useful information from the model.
Custom prediction functions: Each model type provides a set of prediction functions designed for working with the patterns created by that algorithm.
For example, the Lag function is provided for time series models, to let you view the historical data used for the model. For clustering models, functions such as ClusterDistance are more meaningful.
For more information about the functions that are supported for each model type, see the following links:
You can also call VBA functions, or create your own functions. For more information, see Functions (DMX).
General statistics: There are a number of functions that can be used with almost any model type, which return a standard set of descriptive statistics, such as standard deviation.
For example, the PredictHistogram function returns a table that lists all the states of the specified column.
For more information, see General Prediction Functions (DMX).
Custom statistics: Additional supporting functions are provided for each model type, to generate statistics that are relevant to the specific analytical task.
For example, when you are working with a clustering model, you can use the function, PredictCaseLikelihood, to return the likelihood score associated with a certain case and cluster. However, if you created a linear regression model, you would be more interested in retrieving the coefficient and intercept, which you can do using a content query.
Model content functions: The content of all models is represented in a standardized format that lets you retrieve information with a simple query. You create queries on the model content by using DMX. You can also get some type of model content by using the data mining schema rowsets.
In the model content, the meaning of each row or node of the table that is returned differs depending on the type of algorithm that was used to build the model, as well as the data type of the column. For more information, see Content Queries (Data Mining).
Before you can create a query against a model, the data mining model must have been processed. Processing of Analysis Services objects requires special permissions. For more information on processing mining models, see Processing Requirements and Considerations (Data Mining).
To execute queries against a data mining model requires different levels of permissions, depending on the type of query that you run. For example, drillthrough to case or structure data typically requires additional permissions which can be set on the mining structure object or mining model object.
However, if your query uses external data, and includes statements such as OPENROWSET or OPENQUERY, the database that you are querying must enable these statements, and you must have permission on the underlying database objects.
For more information on the security contexts required to run data mining queries, see Security Overview (Data Mining)
In This Section
The topics in this section introduce each type of data mining query in more detail, and provide links to detailed examples of how to create queries against data mingin models.
Use these links to learn how to create and work with data mining queries.