[scikit-learn] what is "value" in the nodes of trees in a gbm?

Sole Galli solegalli at protonmail.com
Tue Oct 31 11:42:03 EDT 2023


Hello again,

I sorted things out in my head and I got at least a few numbers to match:

In the first tree, we are fitting against y_train​, hence:

```
# first tree, top node, square error

np.mean((y_train - y_train.mean())**2)
```

returns as expected:

1.3308271915828764

And the value​, I assume is given by:

```
np.mean(-2 * (y_train - y_train.mean()))

```

Is this correct?

The second tree is fit against the residuals. Hence:

```
# second tree, top node, square error
residuals = y_train - 0.1 * gbm.estimators_[0][0].predict(X_train)np.mean((residuals - np.mean(residuals))**2)

```

returns the expected result, but now the value is not what I expect:

```
np.mean(-2 * (residuals - np.mean(residuals)))

```

So i guess that is not how value in the top node of the second tree is calculated?

Thank you!

Sent with [Proton Mail](https://proton.me/) secure email.

------- Original Message -------
On Tuesday, October 31st, 2023 at 3:51 PM, Guillaume Lemaître <g.lemaitre58 at gmail.com> wrote:

> You probably want to look at the following example section:
>
> https://scikit-learn.org/stable/auto_examples/ensemble/plot_gradient_boosting_regression.html#plot-training-deviance
>
> On Tue, 31 Oct 2023 at 14:52, Sole Galli via scikit-learn <scikit-learn at python.org> wrote:
>
>> Hi Nicolas,
>>
>> Thank you so much for the links and explanation. I really appreciate it.
>>
>> I am struggling to reproduce the results though. There's probably something I don't understand.
>>
>> This is an image of the top node, of the first tree in the ensemble (GradientBoostingRegressor):
>>
>> [Screenshot 2023-10-31 at 14-39-06 4-gbm-local - Jupyter Notebook.png]
>>
>> How can I manually obtain the values for squared_error​ and value​?
>>
>> I thought square_error​ would be:
>>
>> np.mean( (y_train - 0.1 * gbm.estimators_[0][0].predict(X_train))**2)
>>
>> And value​ would be:
>>
>> -2 * (y_train - 0.1 * gbm.estimators_[0][0].predict(X_train))
>>
>> But those calculations do not return the numbers shown in the node.
>>
>> Is there something obvious that I am doing wrong?
>>
>> Thanks a lot!
>>
>> Best
>> Sole
>>
>> Sent with [Proton Mail](https://proton.me/) secure email.
>>
>> ------- Original Message -------
>> On Monday, October 30th, 2023 at 5:34 PM, Nicolas Hug <niourf at gmail.com> wrote:
>>
>>> The node values in GBDTs are an aggregation (typically a regularized average) of the gradients of the samples in that node.
>>>
>>> Each sample (x, y) is associated with a gradient computed as grad = d_loss(pred(x), y) / d_pred(x). These gradients are in the same physical dimension as the target (for regression). Some resources that may help:
>>>
>>> - https://explained.ai/gradient-boosting/descent.html
>>> - https://nicolas-hug.com/blog/gradient_boosting_descent (self plug)
>>>
>>> Nicolas
>>>
>>> On 30/10/2023 16:09, Sole Galli via scikit-learn wrote:
>>>
>>>> Hello everyone,
>>>>
>>>> I am trying to interpret the outputs of gradient boosting machines sample per sample.
>>>>
>>>> What does the "value" in each node of each tree in a gbm regressor mean?
>>>>
>>>> [Untitled.png]
>>>>
>>>> In random forests, value is the mean target value of the observations seen at that node. At the top node it is usually the mean target value of the train set (or bootstrapped sample). As it goes down the leaves it is the mean target value of the samples at each child.
>>>>
>>>> But in gradient boosting machines it is different. And I can't decipher how it is calculated.
>>>>
>>>> I expected the value in the first tree at the top node to be zero, because the residuals of the first tree are zero. But it is not exactly zero.
>>>>
>>>> In summary, how is the value at each node / tree calculated?
>>>>
>>>> Thanks a lot!!!
>>>>
>>>> Warm regards,
>>>> Sole
>>>>
>>>> Sent with [Proton Mail](https://proton.me/) secure email.
>>>>
>>>> _______________________________________________
>>>> scikit-learn mailing list
>>>> scikit-learn at python.org
>>>>
>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>
> --
>
> Guillaume Lemaitre
> Scikit-learn @ Inria Foundation
> https://glemaitre.github.io/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20231031/9f56c56c/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Untitled.png
Type: image/png
Size: 493231 bytes
Desc: not available
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20231031/9f56c56c/attachment-0002.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Screenshot 2023-10-31 at 14-39-06 4-gbm-local - Jupyter Notebook.png
Type: image/png
Size: 39833 bytes
Desc: not available
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20231031/9f56c56c/attachment-0003.png>


More information about the scikit-learn mailing list