Permutation Feature Importance

Important

Support for Machine Learning Studio (classic) will end on 31 August 2024. We recommend you transition to Azure Machine Learning by that date.

Beginning 1 December 2021, you will not be able to create new Machine Learning Studio (classic) resources. Through 31 August 2024, you can continue to use the existing Machine Learning Studio (classic) resources.

ML Studio (classic) documentation is being retired and may not be updated in the future.

Computes the permutation feature importance scores of feature variables given a trained model and a test dataset

Category: Feature Selection Modules

Note

Applies to: Machine Learning Studio (classic) only

Similar drag-and-drop modules are available in Azure Machine Learning designer.

Module overview

This article describes how to use the Permutation Feature Importance module in Machine Learning Studio (classic), to compute a set of feature importance scores for your dataset. You use these scores to help you determine the best features to use in a model.

In this module, feature values are randomly shuffled, one column at a time, and the performance of the model is measured before and after. You can choose one of the standard metrics provided to measure performance.

The scores that the module returns represent the change in the performance of a trained model, after permutation. Important features are usually more sensitive to the shuffling process, and will thus result in higher importance scores.

This article provides a good general overview of permutation feature importance, its theoretical basis, and its applications in machine learning: Permutation feature importance

How to use Permutation Feature Importance

To generate a set of feature scores requires that you have an already trained model, as well as a test dataset.

  1. Add the Permutation Feature Importance module to your experiment. You can find this module in the Feature Selection category.

  2. Connect a trained model to the left input. The model must be a regression model or classification model.

  3. On the right input, connect a dataset, preferably one that is different from the dataset used for training the model. This dataset is used for scoring based on the trained model, and for evaluating the model after feature values have been changed.

  4. For Random seed, type a value to use as seed for randomization. If you specify 0 (the default), a number is generated based on the system clock.

    A seed value is optional, but you should provide a value if you want reproducibility across runs of the same experiment.

  5. For Metric for measuring performance, select a single metric to use when computing model quality after permutation.

    Machine Learning Studio (classic) supports the following metrics, depending on whether you are evaluating a classification or regression model:

    • Classification

      Accuracy, Precision, Recall, Average Log Loss

    • Regression

      Precision, Recall, Mean Absolute Error , Root Mean Squared Error, Relative Absolute Error, Relative Squared Error, Coefficient of Determination

    For a more detailed description of these evaluation metrics, and how they are calculated, see Evaluate.

  6. Run the experiment.

  7. The module outputs a list of feature columns and the scores associated with them, ranked in order of the scores, descending.

Examples

See these sample experiments in the Azure AI Gallery:

Technical notes

This section provides implementation details, tips, and answers to frequently asked questions.

How does this compare to other feature selection methods?

Permutation feature importance works by randomly changing the values of each feature column, one column at a time, and then evaluating the model.

The rankings provided by permutation feature importance are often different from the ones you get from Filter Based Feature Selection, which calculates scores before a model is created.

This is because permutation feature importance doesn’t measure the association between a feature and a target value, but instead captures how much influence each feature has on predictions from the model.

Expected inputs

Name Type Description
Trained model ILearner interface A trained classification or regression model
Test data Data Table Test dataset for scoring and evaluating a model after permutation of feature values

Module parameters

Name Type Range Optional Default Description
Random seed Integer >=0 Required 0 Random number generator seed value
Metric for measuring performance EvaluationMetricType select from list Required Classification - Accuracy Select the metric to use when evaluating the variability of the model after permutations

Outputs

Name Type Description
Feature importance Data Table A dataset containing the feature importance results, based on the selected metric

Exceptions

Exception Description
Error 0062 Exception occurs when attempting to compare two models with different learner types.
Error 0024 Exception occurs if dataset does not contain a label column.
Error 0105 Thrown when a module definition file defines an unsuppported parameter type
Error 0021 Exception occurs if number of rows in some of the datasets passed to the module is too small.

See also

Feature Selection
Filter Based Feature Selection
Principal Component Analysis