Discrepancy in "Feature importances with a forest of trees" documentation
Dear Scikit-learn community, I have been reading some examples in https://scikit-learn.org/stable/auto_examples/ensemble/plot_forest_importanc... about the permutation importance that can be assessed after fitting a tree-based model (e.g. RandomForestClassifier). However, I have noticed a discrepancy that I would like to mention. If a one-hot-encoding step is used before model fitting, the `.feature_importances_` attribute includes importances for all the levels of the transformed categorical features (e.g. for gender, we get 2 importances for males & females, respectively. When I apply the `permutation_importance<https://scikit-learn.org/stable/modules/generated/sklearn.inspection.permutation_importance.html#sklearn.inspection.permutation_importance>` functions though, the outputs correspond to the non-transformed data. To illustrate this, I include a toy example in .py format. Best, Makis
participants (1)
-
Serafeim Loukas