Permutation Feature Importance
Computes the permutation feature importance scores of feature variables given a trained model and a test dataset
Category: Feature Selection Modules
Applies to: Machine Learning Studio
This content pertains only to Studio. Similar drag and drop modules have been added to the visual interface in Machine Learning service. Learn more in this article comparing the two versions.
This article describes how to use the Permutation Feature Importance module in Azure Machine Learning Studio, to compute a set of feature importance scores for your dataset. You use these scores to help you determine the best features to use in a model.
In this module, feature values are randomly shuffled, one column at a time, and the performance of the model is measured before and after. You can choose one of the standard metrics provided to measure performance.
The scores that the module returns represent the change in the performance of a trained model, after permutation. Important features are usually more sensitive to the shuffling process, and will thus result in higher importance scores.
This article provides a good general overview of permutation feature importance, its theoretical basis, and its applications in machine learning: Permutation feature importance
How to use Permutation Feature Importance
To generate a set of feature scores requires that you have an already trained model, as well as a test dataset.
Add the Permutation Feature Importance module to your experiment. You can find this module in the Feature Selection category.
Connect a trained model to the left input. The model must be a regression model or classification model.
On the right input, connect a dataset, preferably one that is different from the dataset used for training the model. This dataset is used for scoring based on the trained model, and for evaluating the model after feature values have been changed.
For Random seed, type a value to use as seed for randomization. If you specify 0 (the default), a number is generated based on the system clock.
A seed value is optional, but you should provide a value if you want reproducibility across runs of the same experiment.
For Metric for measuring performance, select a single metric to use when computing model quality after permutation.
Azure Machine Learning Studio supports the following metrics, depending on whether you are evaluating a classification or regression model:
Accuracy, Precision, Recall, Average Log Loss
Precision, Recall, Mean Absolute Error , Root Mean Squared Error, Relative Absolute Error, Relative Squared Error, Coefficient of Determination
For a more detailed description of these evaluation metrics, and how they are calculated, see Evaluate.
Run the experiment.
The module outputs a list of feature columns and the scores associated with them, ranked in order of the scores, descending.
See these sample experiments in the Azure AI Gallery:
Permutation Feature Importance: Demonstrates how to use this module to rank feature variables of a dataset in order of permutation importance scores.
Using the Permutation Feature Importance module: Illustrates the usage of this module in a web service.
This section provides implementation details, tips, and answers to frequently asked questions.
How does this compare to other feature selection methods?
Permutation feature importance works by randomly changing the values of each feature column, one column at a time, and then evaluating the model.
The rankings provided by permutation feature importance are often different from the ones you get from Filter Based Feature Selection, which calculates scores before a model is created.
This is because permutation feature importance doesn’t measure the association between a feature and a target value, but instead captures how much influence each feature has on predictions from the model.
|Trained model||ILearner interface||A trained classification or regression model|
|Test data||Data Table||Test dataset for scoring and evaluating a model after permutation of feature values|
|Random seed||Integer||>=0||Required||0||Random number generator seed value|
|Metric for measuring performance||EvaluationMetricType||select from list||Required||Classification - Accuracy||Select the metric to use when evaluating the variability of the model after permutations|
|Feature importance||Data Table||A dataset containing the feature importance results, based on the selected metric|
|Error 0062||Exception occurs when attempting to compare two models with different learner types.|
|Error 0024||Exception occurs if dataset does not contain a label column.|
|Error 0105||Thrown when a module definition file defines an unsuppported parameter type|
|Error 0021||Exception occurs if number of rows in some of the datasets passed to the module is too small.|