Permutation Feature Importance

Article
05/06/2019

Important

Support for Machine Learning Studio (classic) will end on 31 August 2024. We recommend you transition to Azure Machine Learning by that date.

Beginning 1 December 2021, you will not be able to create new Machine Learning Studio (classic) resources. Through 31 August 2024, you can continue to use the existing Machine Learning Studio (classic) resources.

See information on moving machine learning projects from ML Studio (classic) to Azure Machine Learning.
Learn more about Azure Machine Learning.

ML Studio (classic) documentation is being retired and may not be updated in the future.

Computes the permutation feature importance scores of feature variables given a trained model and a test dataset

Category: Feature Selection Modules

Note

Applies to: Machine Learning Studio (classic) only

Similar drag-and-drop modules are available in Azure Machine Learning designer.

Module overview

This article describes how to use the Permutation Feature Importance module in Machine Learning Studio (classic), to compute a set of feature importance scores for your dataset. You use these scores to help you determine the best features to use in a model.

In this module, feature values are randomly shuffled, one column at a time, and the performance of the model is measured before and after. You can choose one of the standard metrics provided to measure performance.

The scores that the module returns represent the change in the performance of a trained model, after permutation. Important features are usually more sensitive to the shuffling process, and will thus result in higher importance scores.

This article provides a good general overview of permutation feature importance, its theoretical basis, and its applications in machine learning: Permutation feature importance

How to use Permutation Feature Importance

To generate a set of feature scores requires that you have an already trained model, as well as a test dataset.

Add the Permutation Feature Importance module to your experiment. You can find this module in the Feature Selection category.
Connect a trained model to the left input. The model must be a regression model or classification model.
On the right input, connect a dataset, preferably one that is different from the dataset used for training the model. This dataset is used for scoring based on the trained model, and for evaluating the model after feature values have been changed.
For Random seed, type a value to use as seed for randomization. If you specify 0 (the default), a number is generated based on the system clock.

A seed value is optional, but you should provide a value if you want reproducibility across runs of the same experiment.
For Metric for measuring performance, select a single metric to use when computing model quality after permutation.

Machine Learning Studio (classic) supports the following metrics, depending on whether you are evaluating a classification or regression model:
- Classification
  
  Accuracy, Precision, Recall, Average Log Loss
- Regression
  
  Precision, Recall, Mean Absolute Error , Root Mean Squared Error, Relative Absolute Error, Relative Squared Error, Coefficient of Determination
For a more detailed description of these evaluation metrics, and how they are calculated, see Evaluate.
Run the experiment.
The module outputs a list of feature columns and the scores associated with them, ranked in order of the scores, descending.

Examples

See these sample experiments in the Azure AI Gallery:

Permutation Feature Importance: Demonstrates how to use this module to rank feature variables of a dataset in order of permutation importance scores.
Using the Permutation Feature Importance module: Illustrates the usage of this module in a web service.

Technical notes

This section provides implementation details, tips, and answers to frequently asked questions.

How does this compare to other feature selection methods?

Permutation feature importance works by randomly changing the values of each feature column, one column at a time, and then evaluating the model.

The rankings provided by permutation feature importance are often different from the ones you get from Filter Based Feature Selection, which calculates scores before a model is created.

This is because permutation feature importance doesn’t measure the association between a feature and a target value, but instead captures how much influence each feature has on predictions from the model.

Expected inputs

Name	Type	Description
Trained model	ILearner interface	A trained classification or regression model
Test data	Data Table	Test dataset for scoring and evaluating a model after permutation of feature values

Module parameters

Name	Type	Range	Optional	Default	Description
Random seed	Integer	>=0	Required	0	Random number generator seed value
Metric for measuring performance	EvaluationMetricType	select from list	Required	Classification - Accuracy	Select the metric to use when evaluating the variability of the model after permutations

Outputs

Name	Type	Description
Feature importance	Data Table	A dataset containing the feature importance results, based on the selected metric

Exceptions

Exception	Description
Error 0062	Exception occurs when attempting to compare two models with different learner types.
Error 0024	Exception occurs if dataset does not contain a label column.
Error 0105	Thrown when a module definition file defines an unsuppported parameter type
Error 0021	Exception occurs if number of rows in some of the datasets passed to the module is too small.