[scikit-learn] Gradient Boosting: Feature Importances do not sum to 1

Douglas Chan douglas.chan at ieee.org
Wed Aug 31 19:52:17 EDT 2016

Thanks for your reply, Raphael.

Here’s some code using the Boston dataset to reproduce this.  

=== START CODE ===
import numpy as np
from sklearn import datasets
from sklearn.ensemble import GradientBoostingRegressor

boston = datasets.load_boston()
X, Y = (boston.data, boston.target)

n_estimators = 712   
# Note: From 712 onwards, the feature importance sum is less than 1

params = {'n_estimators': n_estimators, 'max_depth': 6, 'learning_rate': 0.1}
clf = GradientBoostingRegressor(**params)
clf.fit(X, Y)

feature_importance_sum = np.sum(clf.feature_importances_)
print "At n_estimators = %i, feature importance sum = %f" % (n_estimators , feature_importance_sum)

=== END CODE ===

If we deem this to be an error, I can open a bug to track it.  Please share your thoughts on it.

Thank you,

From: Raphael C 
Sent: Tuesday, August 30, 2016 11:28 PM
To: Scikit-learn user and developer mailing list 
Subject: Re: [scikit-learn] Gradient Boosting: Feature Importances do not sum to 1

Can you provide a reproducible example? 

On Wednesday, August 31, 2016, Douglas Chan <douglas.chan at ieee.org> wrote:

  Hello everyone,

  I notice conditions when Feature Importance values do not add up to 1 in ensemble tree methods, like Gradient Boosting Trees or AdaBoost Trees.  I wonder if there’s a bug in the code.

  This error occurs when the ensemble has a large number of estimators.  The exact conditions depend variously.  For example, the error shows up sooner with a smaller amount of training samples.  Or, if the depth of the tree is large.  

  When this error appears, the predicted value seems to have converged.  But it’s unclear if the error is causing the predicted value not to change with more estimators.  In fact, the feature importance sum goes lower and lower with more estimators thereafter.

  I wonder if we’re hitting some floating point calculation error. 

  Looking forward to hear your thoughts on this.

  Thank you!

scikit-learn mailing list
scikit-learn at python.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20160831/cef12571/attachment-0001.html>

More information about the scikit-learn mailing list