Machine Learning | Google for Developers (2024)

Machine Learning

English
Deutsch
Español
Español – América Latina
Français
Indonesia
Italiano
Polski
Português – Brasil
Tiếng Việt
Türkçe
Русский
עברית
العربيّة
فارسی
हिंदी
বাংলা
ภาษาไทย
中文 – 简体
中文 – 繁體
日本語
한국어

Advanced courses

Machine Learning

Variable importances

Variable importance (also known as feature importance) is a score thatindicates how "important" a feature is to the model. For example, if for a givenmodel with two input features "f1" and "f2", the variable importances are{f1=5.8, f2=2.5}, then the feature "f1" is more "important" to the model thanfeature "f2". As with other machine learning models, variable importance is asimple way to understand how a decision tree works.

You can apply model agnostic variable importances such as permutation variableimportances,to decision trees.

Decision trees also have specific variable importances, such as:

The sum of the split score with a given variable.
The number of nodes with a given variable.
The average depth of the first occurrence of a feature across all the treepaths.

Variable importances can differ by qualities such as:

semantics
scale
properties

Furthermore, variable importances provide different types of information about:

the model
the dataset
the training process

For example, the number of conditions containing a specific feature indicateshow much a decision tree is looking at this specific feature, which mightindicate variable importance. After all, the learning algorithm would not haveused a feature in multiple conditions if it did not matter. However, the samefeature appearing in multiple conditions might also indicate that a model istrying but failing to generalize the pattern of a feature. For example, thiscan happen when a feature is just an example identifier with no informationto generalize.

On the other hand, a high value for a high permutation variable importanceindicates that removing a feature hurts the model, which is an indication ofvariable importance. However, if the model is robust, removing any one featuremight not hurt the model.

Because different variable importances inform about different aspects of themodels, looking at several variable importances at the same time is informative.For example, if a feature is important according to all the variableimportances, this feature is likely important. As another example, if a featurehas a high "number of nodes" variable importance and a small "permutation"variable importance, then this feature might be hard to generalize and canhurt the model quality.

YDF Code

In YDF, you can see the variable importance of a model by callingmodel.describe() and looking at the "variable importance" tab.See theModel understanding tutorial for more details.

Previous arrow_back Check your understanding

Next Check your understanding arrow_forward

Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2024-04-18 UTC.