<html>

<head>

<meta http-equiv="Content-Type" content="text/html; charset=Windows-1252">

</head>

<body bgcolor="#FFFFFF" text="#000000">

Hi Doug,<br>

<br>

I modified your code a little bit to calculate the feature_importances of every tree of the forest.<br>

In my opinion these feature importances should also sum to 1.0.<br>

<br>

Since I could not access each DecisionTreeRegressor of your GradientBoositngRegressor, I created a new

<br>

ExtraTreeRegressor.<br>

<br>

This is a bit off topic, but does anyone have an idea, why <br>

type(ExtraTreesRegressor().estimators_) <br>

results in a list and <br>

type(GradientBoostingRegressor().estimators_)<br>

results in an np.array?<br>

<br>

Anyway, here is the code:<br>

<br>

import numpy as np<br>

from sklearn import datasets<br>

from sklearn.ensemble import GradientBoostingRegressor, ExtraTreesRegressor<br>

 <br>

boston = datasets.load_boston()<br>

X, Y = (boston.data, boston.target)<br>

 <br>

n_estimators = 712  <br>

# Note: From 712 onwards, the feature importance sum is less than 1<br>

params = {'n_estimators': n_estimators, 'max_depth': 6, 'learning_rate': 0.1}<br>

clf = GradientBoostingRegressor(**params)<br>

clf.fit(X, Y)<br>

 <br>

feature_importance_sum = np.sum(clf.feature_importances_)<br>

print "At n_estimators = %i, feature importance sum = %.20f" % (n_estimators , feature_importance_sum)<br>

<br>

<br>

n_estimators_forest = 100<br>

clf_forest = ExtraTreesRegressor(n_estimators=n_estimators_forest)<br>

clf_forest.fit(X, Y)<br>

<br>

feature_importance_sum_forest = np.sum(clf_forest.feature_importances_)<br>

forest_feat_imp = [np.sum(tree.feature_importances_) for tree in clf_forest.estimators_]<br>

print "At n_estimators = %i, feature importance sum = %.20f" % (n_estimators_forest, feature_importance_sum_forest)<br>

for idx, imp in enumerate(forest_feat_imp):<br>

    print "imp for tree %i: %.20f" % (idx, imp)<br>

<br>

<br>

I suppose in each tree there is a small rounding error, summing up to the overall error.<br>

So is this a bug or an inevitable rounding issue?<br>

<br>

<br>

Greets,<br>

Piotr<br>

<br>

<div class="moz-cite-prefix">On 09.09.2016 03:51, Douglas Chan wrote:<br>

</div>

<blockquote cite="mid:4C639AAD236E493AABC55F2B4709CAEF@Serendipitous" type="cite">

<div dir="ltr">

<div style="FONT-SIZE: 10pt; FONT-FAMILY: 'Lucida Sans'; COLOR:

          #000000">

<div>Hello everyone,</div>

<div> </div>

<div>I’d like to bring this up again to see if people have any thoughts on it.</div>

<div> </div>

<div>If you also think this is a bug, then we can track it and get it fixed.  Please share your opinions.</div>

<div> </div>

<div>Thank you,</div>

<div>-Doug</div>

<div> </div>

<div style="FONT-SIZE: small; TEXT-DECORATION: none;

            FONT-FAMILY: "Calibri"; FONT-WEIGHT: normal;

            COLOR: #000000; FONT-STYLE: normal; DISPLAY: inline">

<div style="FONT: 10pt tahoma">

<div> </div>

<div style="BACKGROUND: #f5f5f5">

<div style="font-color: black"><b>From:</b> <a moz-do-not-send="true" title="douglas.chan@ieee.org" href="mailto:douglas.chan@ieee.org">

Douglas Chan</a> </div>

<div><b>Sent:</b> Wednesday, August 31, 2016 4:52 PM</div>

<div><b>To:</b> <a moz-do-not-send="true" title="scikit-learn@python.org" href="mailto:scikit-learn@python.org">

Scikit-learn user and developer mailing list</a> ; <a moz-do-not-send="true" title="drraph@gmail.com" href="mailto:drraph@gmail.com">

Raphael C</a> </div>

<div><b>Subject:</b> Re: [scikit-learn] Gradient Boosting: Feature Importances do not sum to 1</div>

</div>

</div>

<div> </div>

</div>

<div style="FONT-SIZE: small; TEXT-DECORATION: none;

            FONT-FAMILY: "Calibri"; FONT-WEIGHT: normal;

            COLOR: #000000; FONT-STYLE: normal; DISPLAY: inline">

<div dir="ltr">

<div style="FONT-SIZE: 10pt; FONT-FAMILY: 'Lucida Sans';

                COLOR: #000000">

<div>Thanks for your reply, Raphael.</div>

<div> </div>

<div>Here’s some code using the Boston dataset to reproduce this.  </div>

<div> </div>

<div>=== START CODE ===</div>

<div>import numpy as np</div>

<div>from sklearn import datasets</div>

<div>from sklearn.ensemble import GradientBoostingRegressor</div>

<div> </div>

<div>boston = datasets.load_boston()</div>

<div>X, Y = (boston.data, boston.target)</div>

<div> </div>

<div>n_estimators = 712   </div>

<div># Note: From 712 onwards, the feature importance sum is less than 1</div>

<div> </div>

<div>params = {'n_estimators': n_estimators, 'max_depth': 6, 'learning_rate': 0.1}</div>

<div>clf = GradientBoostingRegressor(**params)</div>

<div>clf.fit(X, Y)</div>

<div> </div>

<div>feature_importance_sum = np.sum(clf.feature_importances_)</div>

<div>print "At n_estimators = %i, feature importance sum = %f" % (n_estimators , feature_importance_sum)</div>

<div> </div>

<div>=== END CODE ===</div>

<div> </div>

<div>If we deem this to be an error, I can open a bug to track it.  Please share your thoughts on it.</div>

<div> </div>

<div>Thank you,</div>

<div>-Doug</div>

<div> </div>

<div style="FONT-SIZE: small; TEXT-DECORATION: none;

                  FONT-FAMILY: "Calibri"; FONT-WEIGHT: normal;

                  COLOR: #000000; FONT-STYLE: normal; DISPLAY: inline">

<div style="FONT: 10pt tahoma">

<div> </div>

<div style="BACKGROUND: #f5f5f5">

<div style="font-color: black"><b>From:</b> <a moz-do-not-send="true" title="drraph@gmail.com" href="mailto:drraph@gmail.com">

Raphael C</a> </div>

<div><b>Sent:</b> Tuesday, August 30, 2016 11:28 PM</div>

<div><b>To:</b> <a moz-do-not-send="true" title="scikit-learn@python.org" href="mailto:scikit-learn@python.org">

Scikit-learn user and developer mailing list</a> </div>

<div><b>Subject:</b> Re: [scikit-learn] Gradient Boosting: Feature Importances do not sum to 1</div>

</div>

</div>

<div> </div>

</div>

<div style="FONT-SIZE: small; TEXT-DECORATION: none;

                  FONT-FAMILY: "Calibri"; FONT-WEIGHT: normal;

                  COLOR: #000000; FONT-STYLE: normal; DISPLAY: inline">

Can you provide a reproducible example?

<div>Raphael<br>

<br>

On Wednesday, August 31, 2016, Douglas Chan <<a moz-do-not-send="true" href="mailto:douglas.chan@ieee.org"></a><a class="moz-txt-link-abbreviated" href="mailto:douglas.chan@ieee.org">douglas.chan@ieee.org</a>> wrote:<br>

<blockquote class="gmail_quote" style="PADDING-LEFT:

                      1ex; MARGIN: 0px 0px 0px 0.8ex; BORDER-LEFT: #ccc

                      1px solid">

<div dir="ltr">

<div dir="ltr">

<div style="FONT-SIZE: 10pt; FONT-FAMILY:

                            'Lucida Sans'; COLOR: #000000">

<div>Hello everyone,</div>

<div>

<div> </div>

<div>I notice conditions when Feature Importance values do not add up to 1 in ensemble tree methods, like Gradient Boosting Trees or AdaBoost Trees.  I wonder if there’s a bug in the code.</div>

<div> </div>

<div>This error occurs when the ensemble has a large number of estimators.  The exact conditions depend variously.  For example, the error shows up sooner with a smaller amount of training samples.  Or, if the depth of the tree is large. 

</div>

<div> </div>

<div>When this error appears, the predicted value seems to have converged.  But it’s unclear if the error is causing the predicted value not to change with more estimators.  In fact, the feature importance sum goes lower and lower with more estimators thereafter.</div>

<div> </div>

<div>I wonder if we’re hitting some floating point calculation error. </div>

<div> </div>

<div>Looking forward to hear your thoughts on this.</div>

<div> </div>

<div>Thank you!</div>

<div>-Doug</div>

<div> </div>

</div>

</div>

</div>

</div>

</blockquote>

</div>

<p></p>

<hr>

_______________________________________________<br>

scikit-learn mailing list<br>

<a class="moz-txt-link-abbreviated" href="mailto:scikit-learn@python.org">scikit-learn@python.org</a><br>

<a class="moz-txt-link-freetext" href="https://mail.python.org/mailman/listinfo/scikit-learn">https://mail.python.org/mailman/listinfo/scikit-learn</a><br>

</div>

</div>

</div>

</div>

</div>

</div>

<br>

<fieldset class="mimeAttachmentHeader"></fieldset> <br>

<pre wrap="">_______________________________________________

scikit-learn mailing list

<a class="moz-txt-link-abbreviated" href="mailto:scikit-learn@python.org">scikit-learn@python.org</a>

<a class="moz-txt-link-freetext" href="https://mail.python.org/mailman/listinfo/scikit-learn">https://mail.python.org/mailman/listinfo/scikit-learn</a>

</pre>

</blockquote>

<br>

</body>

</html>