[scikit-learn] impurity criterion in gradient boosted regression trees
jmschreiber91 at gmail.com
Thu May 11 19:38:13 EDT 2017
The blog post from Matthew Drury sums it up well. The feature importance is
indeed the Gini impurity.
On Tue, May 9, 2017 at 8:34 AM, Olga Lyashevska <o.lyashevskaya at gmail.com>
> Hi all,
> I am trying to understand differences in feature importance plots obtained
> with R package gbm and sklearn. Having compared both implementation side by
> side it seems that the models are fairly similar, however feature
> importance plots are rather distinct.
> R uses empirical improvement in squared error as it is described in
> Friedmans's "Greedy Function Approximation" paper (eq. 44, 45).
> sklearn (as far as I could see it in the code) uses the weighted reduction
> in node purity. How exactly is this calculated? Is it a gini index? Is
> there a reference?
> I found this, but I find this hard to follow:
> I have also seen a post by Matthew Drury on stack exchange:
> Many thanks,
> scikit-learn mailing list
> scikit-learn at python.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the scikit-learn