[scikit-learn] feature importance calculation in gradient boosting

Olga Lyashevska o.lyashevskaya at gmail.com
Thu Apr 20 05:39:10 EDT 2017


Thank you. It seems that information value can only be calculated for a 
binary classification dataset, however my response variable is continuous.



On 20/04/17 05:51, urvesh patel wrote:
> I believe your random variable by chance have some predictive power. In
> R, use Information package and check information value of that randomly
> created variable. If it is > 0.05 then it has good predictive power.
> On Tue, Apr 18, 2017 at 7:47 AM Olga Lyashevska
> <o.lyashevskaya at gmail.com <mailto:o.lyashevskaya at gmail.com>> wrote:
>
>     Hi,
>
>     I would like to understand how feature importances are calculated in
>     gradient boosting regression.
>
>     I know that these are the relevant functions:
>     https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/ensemble/gradient_boosting.py#L1165
>     https://github.com/scikit-learn/scikit-learn/blob/fc2f24927fc37d7e42917369f17de045b14c59b5/sklearn/tree/_tree.pyx#L1056
>
>      From the literature and elsewhere I understand that Gini impurity is
>     calculated. What is this exactly and how does it relate to 'gain' vs
>     'frequency' implemented in XGBoost?
>     http://xgboost.readthedocs.io/en/latest/R-package/discoverYourData.html
>
>     My problem is that when I fit exactly same model in sklearn and gbm (R
>     package) I get different variable importance plots. One of the variables
>     which was generated randomly (keeping all other variables real) appears
>     to be very important in sklearn and very unimportant in gbm. How is this
>     possible that completely random variable gets the highest importance?
>
>
>     Many thanks,
>     Olga
>     _______________________________________________
>     scikit-learn mailing list
>     scikit-learn at python.org <mailto:scikit-learn at python.org>
>     https://mail.python.org/mailman/listinfo/scikit-learn
>
>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>


More information about the scikit-learn mailing list