Feature Selection And Feature Importance: How Are They Related? (2024)

How are feature selection and feature importance related? This is a question I came upon often when doing research, but it’s also a practical question when doing machine learning.

tl;dr: Feature selection and feature interpretation are different modeling steps, with different goals. They are, however, complementary and if you get feature selection right, you can boost interpretability.

Thanks for reading Mindful Modeler! Subscribe for free to receive new posts and support my work.

I’ve done a lot of research in the field of machine learning interpretability, especially about permutation feature importance.

Permutation feature importance: Shuffle a feature, measure performance on test data before and after shuffling. The importance of the feature is equated as the drop in performance.

In research papers, it’s typical to have a related work section with a survey of similar papers that are related to your research. When writing about permutation feature importance, I was always wondering whether or not to also mention feature selection methods.

Because seemingly methods for selection and importance overlap:

  • In theory, you could use permutation feature importance for feature selection.

  • Some feature selection methods produce “scores” for the features (e.g. correlation or mutual information with target variable), which could be interpreted as feature importance.

  • Models like LASSO, for example, are both used for selecting features, but also as sparse interpretable models.

Let’s look at both modeling steps more deeply.

Feature Selection And Feature Importance: How Are They Related? (1)

Let’s first separate selection and interpretation (which feature importance is a part of) and figure out the goals of these two modeling steps.

In feature selection, our goal is to reduce the dimensionality of the input feature space. There are many methods to do so, ranging from filter methods based on correlation or mutual information to internal model constraints (like L1 regularization) and wrapper methods that train the model with different subsets of features to select the best one.

The underlying motivations to reduce the feature space are also diverse:

Feature selection reduces the number of features used in the model.

Feature selection is usually done either before training the model or as part of the model training pipeline.

Let’s turn to interpretation, especially feature importance. The goal of feature importance is to rank and quantify the feature’s contribution to the model predictions and/or model performance. What important means depends on the importance method that is used. The methods range from using “built-in” notions of importance like standardized absolute regression coefficients in regression models to permutation feature importance and SHAP importance.

The underlying motivations to understand feature importance can be:

  • Understand model behavior

  • Audit the model

  • Debug the model

  • Improve feature engineering

  • Understand the modeled phenomenon

Feature importance ranks the features

Both lists of motivations overlap: We can use feature selection to improve interpretability, but we can also use model interpretation to debug the model and do feature engineering which might impact also feature selection.

So are the two modeling steps entangled after all?

My advice: Treat both as separate but complementary steps.

Feature selection is a pre-processing / model-constraining step that is mostly automated; feature interpretation is more of a post-hoc step that is more hands-on.

But that’s just the default where to start from. Because while selection and interpretation are separate steps, they are related.

Feature selection can be an important step that aids with the later interpretation. The fewer features we have in the model, the fewer plots to interpret, the fewer interactions, and the fewer correlated features. So if you find out that your model has too many features for a meaningful interpretation, it makes sense to enforce feature selection and reduce the number of features while also keeping an eye on model performance.

In addition, many feature selection methods throw out strongly correlated features. Perfect for interpretation, since correlated features can ruin interpretability.

Feature importance interpretation can also inform the selection step. However, I wouldn’t directly use feature importance measures for feature selection. We have enough methods to select non-performing features with feature selection. Feature importance like SHAP importance and others can, however, allow you to make qualitative decisions to remove a feature. For example because by inspecting it you realize it’s a collider or you notice the feature might make your model less robust etc.

For an overview of feature importance methods and other interpretation tools, my book Interpretable Machine Learning is a good resource.

Thanks for reading Mindful Modeler! Subscribe for free to receive new posts and support my work.

Feature Selection And Feature Importance: How Are They Related? (2024)
Top Articles
Know everything about Mobile Banking fraud and how to prevent it  - ICICI Blog
Fishing The French Broad in Asheville - Coastal Angler & The Angler Magazine
Katie Nickolaou Leaving
Davita Internet
123 Movies Black Adam
T Mobile Rival Crossword Clue
Vaya Timeclock
Terraria Enchanting
Call Follower Osrs
3656 Curlew St
W303 Tarkov
UEQ - User Experience Questionnaire: UX Testing schnell und einfach
Cbs Trade Value Chart Fantasy Football
Elizabethtown Mesothelioma Legal Question
Gino Jennings Live Stream Today
Chastity Brainwash
Commodore Beach Club Live Cam
1v1.LOL - Play Free Online | Spatial
Divina Rapsing
Accuweather Mold Count
Johnnie Walker Double Black Costco
Pasco Telestaff
Hannaford Weekly Flyer Manchester Nh
Workshops - Canadian Dam Association (CDA-ACB)
Student Portal Stvt
2011 Hyundai Sonata 2 4 Serpentine Belt Diagram
R Baldurs Gate 3
Vivification Harry Potter
Mini-Mental State Examination (MMSE) – Strokengine
Street Fighter 6 Nexus
La Qua Brothers Funeral Home
Mobile Maher Terminal
NIST Special Publication (SP) 800-37 Rev. 2 (Withdrawn), Risk Management Framework for Information Systems and Organizations: A System Life Cycle Approach for Security and Privacy
Daily Journal Obituary Kankakee
Federal Student Aid
Vanessa West Tripod Jeffrey Dahmer
Midsouthshooters Supply
Union Corners Obgyn
Mudfin Village Wow
Weekly Math Review Q2 7 Answer Key
2Nd Corinthians 5 Nlt
Trending mods at Kenshi Nexus
What is a lifetime maximum benefit? | healthinsurance.org
9294027542
Okta Login Nordstrom
Rick And Morty Soap2Day
Blippi Park Carlsbad
Barback Salary in 2024: Comprehensive Guide | OysterLink
Jeep Forum Cj
Compete My Workforce
Public Broadcasting Service Clg Wiki
Latest Posts
Article information

Author: Francesca Jacobs Ret

Last Updated:

Views: 5331

Rating: 4.8 / 5 (48 voted)

Reviews: 87% of readers found this page helpful

Author information

Name: Francesca Jacobs Ret

Birthday: 1996-12-09

Address: Apt. 141 1406 Mitch Summit, New Teganshire, UT 82655-0699

Phone: +2296092334654

Job: Technology Architect

Hobby: Snowboarding, Scouting, Foreign language learning, Dowsing, Baton twirling, Sculpting, Cabaret

Introduction: My name is Francesca Jacobs Ret, I am a innocent, super, beautiful, charming, lucky, gentle, clever person who loves writing and wants to share my knowledge and understanding with you.