[scikit-learn] feature importance calculation in gradient boosting
Olga Lyashevska
o.lyashevskaya at gmail.com
Thu Apr 20 05:39:10 EDT 2017
Thank you. It seems that information value can only be calculated for a
binary classification dataset, however my response variable is continuous.
On 20/04/17 05:51, urvesh patel wrote:
> I believe your random variable by chance have some predictive power. In
> R, use Information package and check information value of that randomly
> created variable. If it is > 0.05 then it has good predictive power.
> On Tue, Apr 18, 2017 at 7:47 AM Olga Lyashevska
> <o.lyashevskaya at gmail.com <mailto:o.lyashevskaya at gmail.com>> wrote:
>
> Hi,
>
> I would like to understand how feature importances are calculated in
> gradient boosting regression.
>
> I know that these are the relevant functions:
> https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/ensemble/gradient_boosting.py#L1165
> https://github.com/scikit-learn/scikit-learn/blob/fc2f24927fc37d7e42917369f17de045b14c59b5/sklearn/tree/_tree.pyx#L1056
>
> From the literature and elsewhere I understand that Gini impurity is
> calculated. What is this exactly and how does it relate to 'gain' vs
> 'frequency' implemented in XGBoost?
> http://xgboost.readthedocs.io/en/latest/R-package/discoverYourData.html
>
> My problem is that when I fit exactly same model in sklearn and gbm (R
> package) I get different variable importance plots. One of the variables
> which was generated randomly (keeping all other variables real) appears
> to be very important in sklearn and very unimportant in gbm. How is this
> possible that completely random variable gets the highest importance?
>
>
> Many thanks,
> Olga
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org <mailto:scikit-learn at python.org>
> https://mail.python.org/mailman/listinfo/scikit-learn
>
>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
More information about the scikit-learn
mailing list