Linear Discriminant Analysis (deprecated)
Identify a linear set of variables that best separates two or more classes
Category: Feature Selection Modules
This article describes how to use the Fisher Linear Discriminant Analysis module in Azure Machine Learning Studio, to create a set of scores that identifies the combination of features that best separate two or more classes.
You provide a set of possible feature columns as inputs, and the algorithm determines the optimal combination of the input columns that linearly separates each group of data while minimizing the distances within each group.
This module is provided solely for backward compatibility with experiments created using the pre-release version of Azure Machine Learning. We recommend that you modify your experiments to use Fisher Linear Discriminant Analysis instead.
More about linear discriminant analysis
Linear discriminant analysis is often used for dimensionality reduction, because it projects a set of features onto a smaller feature space while preserving the information that discriminates between classes. This not only reduces computational costs for a given classification task, but can help prevent overfitting.
Linear discriminant analysis is similar to analysis of variance (ANOVA) in that it works by comparing the means of the variables, and is based on these assumptions:
Predictors are independent
Values are normally distributed
Variances among groups are similar
Linear Discriminant Analysis is sometimes abbreviated to LDA, but this is easily confused with Latent Dirichlet Allocation. The techniques are completely different.
How to use Linear Discriminant Analysis
Add the Linear Discriminant Analysis module to your experiment in Studio, and connect the dataset you want to evaluate.
Select a set of numeric feature columns as inputs. The columns provided as inputs must meet these requirements:
- Your data must be complete (no missing values).
- It is also useful to have fewer predictors than there are samples.
- Because the values are expected to have a normal distribution, you should review the data for outliers.
Run the experiment.
The algorithm determines the optimal combination of the input columns that linearly separates each group of data while minimizing the distances within each group.
The module has two outputs:
Feature Extractors: A set of scores (eigenvectors), also called a discrimination matrix.
Transformed Features: A dataset containing the features that have been transformed using the eigenvectors.
To see examples of how feature selection is used in machine learning experiments, see the Azure AI Gallery:
Twitter Sentiment Analysis: Uses filter-Based Feature Selection to improve experiment results.
Fisher Linear Discriminant Analysis: Demonstrates how to use this module for dimensionality reduction.
This section contains implementation details, tips, and answers to frequently asked questions.
This method works only on continuous variables, not categorical or ordinal variables.
Rows with missing values are ignored when computing the transformation matrix.
The algorithm will examine all numeric columns not designated as labels, to see if there is any correlation. If you want to exclude a numeric column, add a Select Columns in Dataset module before feature selection to create a view that contains only the columns you wish to analyze.
For more information about how the eigenvalues are calculated, see this paper (PDF download):
- Eigenvector-based Feature Extraction for Classification. Tymbal, Puuronen et al.
|Dataset||Data Table||Input dataset|
|Class labels column||any||ColumnSelection||None||Select the column that contains the categorical class labels|
|Feature extractors||Data Table||Eigen vectors of input dataset|
|Transformed features||Data Table||Fisher linear discriminant analysis features transformed to eigen vector space|
|Error 0001||Exception occurs if one or more specified columns of data set couldn't be found.|
|Error 0003||Exception occurs if one or more of inputs are null or empty.|
|Error 0017||Exception occurs if one or more specified columns have type unsupported by current module.|
For a list of errors specific to Studio modules, see Machine Learning Error codes
For a list of API exceptions, see Machine Learning REST API Error Codes.