Modeling Flags (Data Mining)

You can use modeling flags in SQL Server 2008 Analysis Services (SSAS) to provide additional information to a data mining algorithm about the data that is defined in a case table. The algorithm can use this information to build a more accurate data mining model.

You can use Data Mining Extensions (DMX) to define modeling flags programmatically, or you can define them in Data Mining Designer in Business Intelligence Development Studio. For more information about how to define these flags, see Mining Model Columns.

Some modeling flags are defined at the level of the mining structure, whereas others are defined at the level of the mining model column. For example, the NOT NULL modeling flag is used with mining structure columns. You can define additional modeling flags on the mining model column.

The following list describes the modeling flags that are supported in Analysis Services. For information about modeling flags that are supported by specific algorithms, see the technical reference topic for the algorithm.

  • NOT NULL
    Indicates that the values for the attribute column should never contain a null value. An error will result if Analysis Services encounters a null value for this attribute column during the model training process.

  • MODEL_EXISTENCE_ONLY
    Indicates that the column will be treated as having two states: Missing and Existing. If the value is NULL, it is treated as Missing.

    Note

    Missing is a special state used by the algorithm, and differs from the text value "Missing" in a column. For more information, see Missing Values (Analysis Services - Data Mining).

    A typical use for this modeling flag is to indicate attributes where the NULL state has an implicit meaning, and the explicit value of the NOT NULL state might not be as important as the fact that the column has any value. For example, a [DateContractSigned] column might be NULL if a contract was never signed and NOT NULL if the contract was signed. Therefore, if the purpose of the model is to predict whether a contract will be signed, you can use the MODEL_EXISTENCE_ONLY flag to ignore the exact date value in the NOT NULL cases and distinguish only between cases where the value is Missing or Existing.

  • REGRESSOR
    The values for the attribute column are less important than the presence of the attribute. This flag is defined on a mining model column.

Note

Third-party plug-ins might have other modeling flags, in addition to those pre-defined by Analysis Services.

Viewing and Changing Modeling Flags

In Data Mining Designer, you can view and modify the modeling flags associated with a mining structure or mining column by viewing the properties of the structure or model.

To view or change the modeling flag for a structure column or model column

  1. In BI Development Studio, In Solution Explorer, double-click the mining structure.

  2. To set the NOT NULL modeling flag, click the Mining Structure tab.

    To set the REGRESSOR or MODEL_EXISTENCE_ONLY flags, click the Mining Model tab.

  3. Right-click the column you want to view or change, and select Properties.

  4. To add a new modeling flag, click the text box next to the ModelingFlags property, and select the check box or check boxes for the modeling flags you want to use.

    Modeling flags are displayed only if they are appropriate for the column data type.

    Note

    After you change a modeling flag, you must reprocess the model.

You cannot change the modeling flags used in an existing mining model and structure by using DMX. You must create a new mining model by using the ALTER MINING STRUCTURE….ADD MINING MODEL syntax.

If you are not sure which modeling flags are being used in the current structure, you can create a query that returns the modeling flags by using the following syntax:

SELECT COLUMN_NAME, MODELING_FLAG
FROM $system.DMSCHEMA_MINING_STRUCTURE_COLUMNS
WHERE STRUCTURE_NAME = '<structure name>'

Using the REGRESSOR Modeling Flag

When you set the REGRESSOR modeling flag on a column, you are indicating to the algorithm that the column contains potential regressors. The actual regressors that are used in the model are determined by the algorithm. A potential regressor can be discarded if it does not model the predictable attribute.

When you build a model using the Data Mining wizard, all continuous input columns are flagged as possible regressors. Therefore, even if you do not explicitly set the REGRESSOR flag on a column, the column might be used as a regressor in the final model.

You can determine the regressors that were actually used in the final model by performing a query against the schema rowset for the mining model, as shown in the following example:

SELECT COLUMN_NAME, MODELING_FLAG
FROM $system.DMSCHEMA_MINING_columnS
WHERE MODEL_NAME = '<model name>'

Note   If you modify a mining model and change the content type of a column from continuous to discrete, you must manually change the flag on the mining column and then reprocess the model.

Regressors in Linear Regression Models

Linear regression models are based on the Microsoft Decision Trees algorithm. Even if you do not use the Microsoft Linear Regression algorithm, any decision tree model can contain a tree or nodes that represents a regression on a continuous attribute.

You do not need to specify that a continuous column represents a regressor. The Microsoft Decision Trees algorithm will partition the dataset into regions with meaningful patterns even if you do not set the REGRESSOR flag on the column. The difference is that when you set the modeling flag, the algorithm will try to find regression equations of the form a*C1 + b*C2 + ... to fit the patterns in the nodes of the tree. The sum of the residuals is calculated, and if the deviation is too great, a split is forced in the tree.

For example, if you are predicting customer purchasing behavior using Income as an attribute, and set the REGRESSOR modeling flag on the column, the algorithm would first try to fit the Income values by using a standard regression formula. If the deviation is too great, the regression formula is abandoned and the tree would be split on some other attribute. The decision tree algorithm would then try fit a regressor for income in each of the branches after the split.

You can use the FORCED_REGRESSOR parameter to guarantee that the algorithm will use a particular regressor. This parameter can be used with the Decision Trees algorithm and Linear Regression algorithm.