From solegalli at protonmail.com Wed Nov 1 06:58:05 2023
From: solegalli at protonmail.com (Sole Galli)
Date: Wed, 01 Nov 2023 10:58:05 +0000
Subject: [scikit-learn] what is "value" in the nodes of trees in a gbm?
In-Reply-To:
References:
Message-ID: <1TJAlJA2GciQtyqy5QSTC8Q0FsmnQ2_ipTtdsQLMTO50ezopzQsVxlhko6AofLQb4lF6a5PE_Wp6E9TY8Er-z90HwqYmkMHyBUeY10JyDUQ=@protonmail.com>
Thank you Guillaume,
I dag a bit and found that sum_tota?l is the sum of some weighted y:
```
for k in range(self.n_outputs):
y_ik = self.y[i, k]
w_y_ik = w * y_ik self.sum_total[k] += w_y_ik
```
Is w the learning rate? and what is y exactly?
Thank you!
Best wishes
Sole
Sent with [Proton Mail](https://proton.me/) secure email.
------- Original Message -------
On Tuesday, October 31st, 2023 at 4:55 PM, Guillaume Lema?tre wrote:
> The values are always computed in the following manner:
>
> cdef void node_value(self, float64_t* dest) noexcept nogil:
> """Compute the node value of sample_indices[start:end] into dest."""
> cdef intp_t k
>
> for k in range(self.n_outputs):
> dest[k] = self.sum_total[k] / self.weighted_n_node_samples
>
> On Tue, 31 Oct 2023 at 16:42, Sole Galli wrote:
>
>> Hello again,
>>
>> I sorted things out in my head and I got at least a few numbers to match:
>>
>> In the first tree, we are fitting against y_train?, hence:
>>
>> ```
>> # first tree, top node, square error
>>
>> np.mean((y_train - y_train.mean())**2)
>> ```
>>
>> returns as expected:
>>
>> 1.3308271915828764
>>
>> And the value?, I assume is given by:
>>
>> ```
>> np.mean(-2 * (y_train - y_train.mean()))
>>
>> ```
>>
>> Is this correct?
>>
>> The second tree is fit against the residuals. Hence:
>>
>> ```
>> # second tree, top node, square error
>> residuals = y_train - 0.1 * gbm.estimators_[0][0].predict(X_train)np.mean((residuals - np.mean(residuals))**2)
>>
>> ```
>>
>> returns the expected result, but now the value is not what I expect:
>>
>> ```
>> np.mean(-2 * (residuals - np.mean(residuals)))
>>
>> ```
>>
>> So i guess that is not how value in the top node of the second tree is calculated?
>>
>> Thank you!
>>
>> Sent with [Proton Mail](https://proton.me/) secure email.
>>
>> ------- Original Message -------
>> On Tuesday, October 31st, 2023 at 3:51 PM, Guillaume Lema?tre wrote:
>>
>>> You probably want to look at the following example section:
>>>
>>> https://scikit-learn.org/stable/auto_examples/ensemble/plot_gradient_boosting_regression.html#plot-training-deviance
>>>
>>> On Tue, 31 Oct 2023 at 14:52, Sole Galli via scikit-learn wrote:
>>>
>>>> Hi Nicolas,
>>>>
>>>> Thank you so much for the links and explanation. I really appreciate it.
>>>>
>>>> I am struggling to reproduce the results though. There's probably something I don't understand.
>>>>
>>>> This is an image of the top node, of the first tree in the ensemble (GradientBoostingRegressor):
>>>>
>>>> [Screenshot 2023-10-31 at 14-39-06 4-gbm-local - Jupyter Notebook.png]
>>>>
>>>> How can I manually obtain the values for squared_error? and value??
>>>>
>>>> I thought square_error? would be:
>>>>
>>>> np.mean( (y_train - 0.1 * gbm.estimators_[0][0].predict(X_train))**2)
>>>>
>>>> And value? would be:
>>>>
>>>> -2 * (y_train - 0.1 * gbm.estimators_[0][0].predict(X_train))
>>>>
>>>> But those calculations do not return the numbers shown in the node.
>>>>
>>>> Is there something obvious that I am doing wrong?
>>>>
>>>> Thanks a lot!
>>>>
>>>> Best
>>>> Sole
>>>>
>>>> Sent with [Proton Mail](https://proton.me/) secure email.
>>>>
>>>> ------- Original Message -------
>>>> On Monday, October 30th, 2023 at 5:34 PM, Nicolas Hug wrote:
>>>>
>>>>> The node values in GBDTs are an aggregation (typically a regularized average) of the gradients of the samples in that node.
>>>>>
>>>>> Each sample (x, y) is associated with a gradient computed as grad = d_loss(pred(x), y) / d_pred(x). These gradients are in the same physical dimension as the target (for regression). Some resources that may help:
>>>>>
>>>>> - https://explained.ai/gradient-boosting/descent.html
>>>>> - https://nicolas-hug.com/blog/gradient_boosting_descent (self plug)
>>>>>
>>>>> Nicolas
>>>>>
>>>>> On 30/10/2023 16:09, Sole Galli via scikit-learn wrote:
>>>>>
>>>>>> Hello everyone,
>>>>>>
>>>>>> I am trying to interpret the outputs of gradient boosting machines sample per sample.
>>>>>>
>>>>>> What does the "value" in each node of each tree in a gbm regressor mean?
>>>>>>
>>>>>> [Untitled.png]
>>>>>>
>>>>>> In random forests, value is the mean target value of the observations seen at that node. At the top node it is usually the mean target value of the train set (or bootstrapped sample). As it goes down the leaves it is the mean target value of the samples at each child.
>>>>>>
>>>>>> But in gradient boosting machines it is different. And I can't decipher how it is calculated.
>>>>>>
>>>>>> I expected the value in the first tree at the top node to be zero, because the residuals of the first tree are zero. But it is not exactly zero.
>>>>>>
>>>>>> In summary, how is the value at each node / tree calculated?
>>>>>>
>>>>>> Thanks a lot!!!
>>>>>>
>>>>>> Warm regards,
>>>>>> Sole
>>>>>>
>>>>>> Sent with [Proton Mail](https://proton.me/) secure email.
>>>>>>
>>>>>> _______________________________________________
>>>>>> scikit-learn mailing list
>>>>>> scikit-learn at python.org
>>>>>>
>>>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>>
>>>> _______________________________________________
>>>> scikit-learn mailing list
>>>> scikit-learn at python.org
>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>
>>> --
>>>
>>> Guillaume Lemaitre
>>> Scikit-learn @ Inria Foundation
>>> https://glemaitre.github.io/
>
> --
>
> Guillaume Lemaitre
> Scikit-learn @ Inria Foundation
> https://glemaitre.github.io/
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Untitled.png
Type: image/png
Size: 493231 bytes
Desc: not available
URL:
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Screenshot 2023-10-31 at 14-39-06 4-gbm-local - Jupyter Notebook.png
Type: image/png
Size: 39833 bytes
Desc: not available
URL:
From g.lemaitre58 at gmail.com Wed Nov 1 07:04:30 2023
From: g.lemaitre58 at gmail.com (=?UTF-8?Q?Guillaume_Lema=C3=AEtre?=)
Date: Wed, 1 Nov 2023 12:04:30 +0100
Subject: [scikit-learn] what is "value" in the nodes of trees in a gbm?
In-Reply-To: <1TJAlJA2GciQtyqy5QSTC8Q0FsmnQ2_ipTtdsQLMTO50ezopzQsVxlhko6AofLQb4lF6a5PE_Wp6E9TY8Er-z90HwqYmkMHyBUeY10JyDUQ=@protonmail.com>
References:
<1TJAlJA2GciQtyqy5QSTC8Q0FsmnQ2_ipTtdsQLMTO50ezopzQsVxlhko6AofLQb4lF6a5PE_Wp6E9TY8Er-z90HwqYmkMHyBUeY10JyDUQ=@protonmail.com>
Message-ID:
The learning rate is not part of the tree. This is part of the boosting
loop where you update the raw predictions (residuals) of the next step:
https://github.com/scikit-learn/scikit-learn/blob/main/sklearn/ensemble/_gb.py#L254-L257
On Wed, 1 Nov 2023 at 11:58, Sole Galli wrote:
> Thank you Guillaume,
>
> I dag a bit and found that sum_tota?l is the sum of some weighted y:
>
> ```
> for k in range(self.n_outputs):
> y_ik = self.y[i, k]
> w_y_ik = w * y_ik
> self.sum_total[k] += w_y_ik
> ```
>
> Is w the learning rate? and what is y exactly?
>
> Thank you!
>
> Best wishes
> Sole
>
>
> Sent with Proton Mail secure email.
>
> ------- Original Message -------
> On Tuesday, October 31st, 2023 at 4:55 PM, Guillaume Lema?tre <
> g.lemaitre58 at gmail.com> wrote:
>
> The values are always computed in the following manner:
>
> cdef void node_value(self, float64_t* dest) noexcept nogil:
> """Compute the node value of sample_indices[start:end] into dest."""
> cdef intp_t k
>
> for k in range(self.n_outputs):
> dest[k] = self.sum_total[k] / self.weighted_n_node_samples
>
> On Tue, 31 Oct 2023 at 16:42, Sole Galli wrote:
>
>> Hello again,
>>
>> I sorted things out in my head and I got at least a few numbers to match:
>>
>> In the first tree, we are fitting against y_train?, hence:
>>
>> ```
>> # first tree, top node, square error
>> np.mean((y_train - y_train.mean())**2)
>> ```
>>
>> returns as expected:
>>
>> 1.3308271915828764
>>
>>
>> And the value?, I assume is given by:
>>
>> ```
>> np.mean(-2 * (y_train - y_train.mean()))
>> ```
>>
>> Is this correct?
>>
>>
>> The second tree is fit against the residuals. Hence:
>>
>> ```
>> # second tree, top node, square error
>> residuals = y_train - 0.1 * gbm.estimators_[0][0].predict(X_train)
>> np.mean((residuals - np.mean(residuals))**2)
>> ```
>>
>> returns the expected result, but now the value is not what I expect:
>>
>> ```
>> np.mean(-2 * (residuals - np.mean(residuals)))
>> ```
>>
>> So i guess that is not how value in the top node of the second tree is
>> calculated?
>>
>> Thank you!
>>
>> Sent with Proton Mail secure email.
>>
>> ------- Original Message -------
>> On Tuesday, October 31st, 2023 at 3:51 PM, Guillaume Lema?tre <
>> g.lemaitre58 at gmail.com> wrote:
>>
>> You probably want to look at the following example section:
>>
>>
>> https://scikit-learn.org/stable/auto_examples/ensemble/plot_gradient_boosting_regression.html#plot-training-deviance
>>
>> On Tue, 31 Oct 2023 at 14:52, Sole Galli via scikit-learn <
>> scikit-learn at python.org> wrote:
>>
>>> Hi Nicolas,
>>>
>>> Thank you so much for the links and explanation. I really appreciate it.
>>>
>>> I am struggling to reproduce the results though. There's probably
>>> something I don't understand.
>>>
>>> This is an image of the top node, of the first tree in the ensemble
>>> (GradientBoostingRegressor):
>>>
>>> [image: Screenshot 2023-10-31 at 14-39-06 4-gbm-local - Jupyter
>>> Notebook.png]
>>>
>>>
>>> How can I manually obtain the values for squared_error? and value??
>>>
>>> I thought square_error? would be:
>>>
>>> np.mean( (y_train - 0.1 * gbm.estimators_[0][0].predict(X_train))**2)
>>>
>>> And value? would be:
>>>
>>> -2 * (y_train - 0.1 * gbm.estimators_[0][0].predict(X_train))
>>>
>>> But those calculations do not return the numbers shown in the node.
>>>
>>> Is there something obvious that I am doing wrong?
>>>
>>> Thanks a lot!
>>>
>>> Best
>>> Sole
>>>
>>> Sent with Proton Mail secure email.
>>>
>>> ------- Original Message -------
>>> On Monday, October 30th, 2023 at 5:34 PM, Nicolas Hug
>>> wrote:
>>>
>>> The node values in GBDTs are an aggregation (typically a regularized
>>> average) of the *gradients *of the samples in that node.
>>>
>>> Each sample (x, y) is associated with a gradient computed as grad =
>>> d_loss(pred(x), y) / d_pred(x). These gradients are in the same physical
>>> dimension as the target (for regression). Some resources that may help:
>>>
>>>
>>> - https://explained.ai/gradient-boosting/descent.html
>>> - https://nicolas-hug.com/blog/gradient_boosting_descent (self plug)
>>> Nicolas
>>>
>>> On 30/10/2023 16:09, Sole Galli via scikit-learn wrote:
>>>
>>> Hello everyone,
>>>
>>> I am trying to interpret the outputs of gradient boosting machines
>>> sample per sample.
>>>
>>> What does the "value" in each node of each tree in a gbm regressor mean?
>>>
>>> [image: Untitled.png]
>>>
>>> In random forests, value is the mean target value of the observations
>>> seen at that node. At the top node it is usually the mean target value of
>>> the train set (or bootstrapped sample). As it goes down the leaves it is
>>> the mean target value of the samples at each child.
>>>
>>> But in gradient boosting machines it is different. And I can't decipher
>>> how it is calculated.
>>>
>>> I expected the value in the first tree at the top node to be zero,
>>> because the residuals of the first tree are zero. But it is not exactly
>>> zero.
>>>
>>> In summary, *how is the value at each node / tree calculated?*
>>>
>>> Thanks a lot!!!
>>>
>>> Warm regards,
>>> Sole
>>>
>>>
>>> Sent with Proton Mail secure email.
>>>
>>> _______________________________________________
>>> scikit-learn mailing listscikit-learn at python.orghttps://mail.python.org/mailman/listinfo/scikit-learn
>>>
>>>
>>> _______________________________________________
>>> scikit-learn mailing list
>>> scikit-learn at python.org
>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>
>>
>>
>> --
>> Guillaume Lemaitre
>> Scikit-learn @ Inria Foundation
>> https://glemaitre.github.io/
>>
>>
>>
>
> --
> Guillaume Lemaitre
> Scikit-learn @ Inria Foundation
> https://glemaitre.github.io/
>
>
>
--
Guillaume Lemaitre
Scikit-learn @ Inria Foundation
https://glemaitre.github.io/
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Untitled.png
Type: image/png
Size: 493231 bytes
Desc: not available
URL:
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Screenshot 2023-10-31 at 14-39-06 4-gbm-local - Jupyter Notebook.png
Type: image/png
Size: 39833 bytes
Desc: not available
URL:
From solegalli at protonmail.com Wed Nov 1 09:56:43 2023
From: solegalli at protonmail.com (Sole Galli)
Date: Wed, 01 Nov 2023 13:56:43 +0000
Subject: [scikit-learn] what is "value" in the nodes of trees in a gbm?
In-Reply-To:
References:
<1TJAlJA2GciQtyqy5QSTC8Q0FsmnQ2_ipTtdsQLMTO50ezopzQsVxlhko6AofLQb4lF6a5PE_Wp6E9TY8Er-z90HwqYmkMHyBUeY10JyDUQ=@protonmail.com>
Message-ID:
Thanks for your email again Guillaume.
I am still looping on this, I left a question on stackoverflow:
https://stackoverflow.com/questions/77396735/how-to-calculate-the-values-at-each-node-in-a-scikit-learn-gradientboostingregre
Cheers
Sole
Sent with [Proton Mail](https://proton.me/) secure email.
------- Original Message -------
On Wednesday, November 1st, 2023 at 12:04 PM, Guillaume Lema?tre wrote:
> The learning rate is not part of the tree. This is part of the boosting loop where you update the raw predictions (residuals) of the next step:
>
> https://github.com/scikit-learn/scikit-learn/blob/main/sklearn/ensemble/_gb.py#L254-L257
>
> On Wed, 1 Nov 2023 at 11:58, Sole Galli wrote:
>
>> Thank you Guillaume,
>>
>> I dag a bit and found that sum_tota?l is the sum of some weighted y:
>>
>> ```
>> for k in range(self.n_outputs):
>> y_ik = self.y[i, k]
>> w_y_ik = w * y_ik self.sum_total[k] += w_y_ik
>> ```
>>
>> Is w the learning rate? and what is y exactly?
>>
>> Thank you!
>>
>> Best wishes
>> Sole
>>
>> Sent with [Proton Mail](https://proton.me/) secure email.
>>
>> ------- Original Message -------
>> On Tuesday, October 31st, 2023 at 4:55 PM, Guillaume Lema?tre wrote:
>>
>>> The values are always computed in the following manner:
>>>
>>> cdef void node_value(self, float64_t* dest) noexcept nogil:
>>> """Compute the node value of sample_indices[start:end] into dest."""
>>> cdef intp_t k
>>>
>>> for k in range(self.n_outputs):
>>> dest[k] = self.sum_total[k] / self.weighted_n_node_samples
>>>
>>> On Tue, 31 Oct 2023 at 16:42, Sole Galli wrote:
>>>
>>>> Hello again,
>>>>
>>>> I sorted things out in my head and I got at least a few numbers to match:
>>>>
>>>> In the first tree, we are fitting against y_train?, hence:
>>>>
>>>> ```
>>>> # first tree, top node, square error
>>>>
>>>> np.mean((y_train - y_train.mean())**2)
>>>> ```
>>>>
>>>> returns as expected:
>>>>
>>>> 1.3308271915828764
>>>>
>>>> And the value?, I assume is given by:
>>>>
>>>> ```
>>>> np.mean(-2 * (y_train - y_train.mean()))
>>>>
>>>> ```
>>>>
>>>> Is this correct?
>>>>
>>>> The second tree is fit against the residuals. Hence:
>>>>
>>>> ```
>>>> # second tree, top node, square error
>>>> residuals = y_train - 0.1 * gbm.estimators_[0][0].predict(X_train)np.mean((residuals - np.mean(residuals))**2)
>>>>
>>>> ```
>>>>
>>>> returns the expected result, but now the value is not what I expect:
>>>>
>>>> ```
>>>> np.mean(-2 * (residuals - np.mean(residuals)))
>>>>
>>>> ```
>>>>
>>>> So i guess that is not how value in the top node of the second tree is calculated?
>>>>
>>>> Thank you!
>>>>
>>>> Sent with [Proton Mail](https://proton.me/) secure email.
>>>>
>>>> ------- Original Message -------
>>>> On Tuesday, October 31st, 2023 at 3:51 PM, Guillaume Lema?tre wrote:
>>>>
>>>>> You probably want to look at the following example section:
>>>>>
>>>>> https://scikit-learn.org/stable/auto_examples/ensemble/plot_gradient_boosting_regression.html#plot-training-deviance
>>>>>
>>>>> On Tue, 31 Oct 2023 at 14:52, Sole Galli via scikit-learn wrote:
>>>>>
>>>>>> Hi Nicolas,
>>>>>>
>>>>>> Thank you so much for the links and explanation. I really appreciate it.
>>>>>>
>>>>>> I am struggling to reproduce the results though. There's probably something I don't understand.
>>>>>>
>>>>>> This is an image of the top node, of the first tree in the ensemble (GradientBoostingRegressor):
>>>>>>
>>>>>> [Screenshot 2023-10-31 at 14-39-06 4-gbm-local - Jupyter Notebook.png]
>>>>>>
>>>>>> How can I manually obtain the values for squared_error? and value??
>>>>>>
>>>>>> I thought square_error? would be:
>>>>>>
>>>>>> np.mean( (y_train - 0.1 * gbm.estimators_[0][0].predict(X_train))**2)
>>>>>>
>>>>>> And value? would be:
>>>>>>
>>>>>> -2 * (y_train - 0.1 * gbm.estimators_[0][0].predict(X_train))
>>>>>>
>>>>>> But those calculations do not return the numbers shown in the node.
>>>>>>
>>>>>> Is there something obvious that I am doing wrong?
>>>>>>
>>>>>> Thanks a lot!
>>>>>>
>>>>>> Best
>>>>>> Sole
>>>>>>
>>>>>> Sent with [Proton Mail](https://proton.me/) secure email.
>>>>>>
>>>>>> ------- Original Message -------
>>>>>> On Monday, October 30th, 2023 at 5:34 PM, Nicolas Hug wrote:
>>>>>>
>>>>>>> The node values in GBDTs are an aggregation (typically a regularized average) of the gradients of the samples in that node.
>>>>>>>
>>>>>>> Each sample (x, y) is associated with a gradient computed as grad = d_loss(pred(x), y) / d_pred(x). These gradients are in the same physical dimension as the target (for regression). Some resources that may help:
>>>>>>>
>>>>>>> - https://explained.ai/gradient-boosting/descent.html
>>>>>>> - https://nicolas-hug.com/blog/gradient_boosting_descent (self plug)
>>>>>>>
>>>>>>> Nicolas
>>>>>>>
>>>>>>> On 30/10/2023 16:09, Sole Galli via scikit-learn wrote:
>>>>>>>
>>>>>>>> Hello everyone,
>>>>>>>>
>>>>>>>> I am trying to interpret the outputs of gradient boosting machines sample per sample.
>>>>>>>>
>>>>>>>> What does the "value" in each node of each tree in a gbm regressor mean?
>>>>>>>>
>>>>>>>> [Untitled.png]
>>>>>>>>
>>>>>>>> In random forests, value is the mean target value of the observations seen at that node. At the top node it is usually the mean target value of the train set (or bootstrapped sample). As it goes down the leaves it is the mean target value of the samples at each child.
>>>>>>>>
>>>>>>>> But in gradient boosting machines it is different. And I can't decipher how it is calculated.
>>>>>>>>
>>>>>>>> I expected the value in the first tree at the top node to be zero, because the residuals of the first tree are zero. But it is not exactly zero.
>>>>>>>>
>>>>>>>> In summary, how is the value at each node / tree calculated?
>>>>>>>>
>>>>>>>> Thanks a lot!!!
>>>>>>>>
>>>>>>>> Warm regards,
>>>>>>>> Sole
>>>>>>>>
>>>>>>>> Sent with [Proton Mail](https://proton.me/) secure email.
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> scikit-learn mailing list
>>>>>>>> scikit-learn at python.org
>>>>>>>>
>>>>>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>>>>
>>>>>> _______________________________________________
>>>>>> scikit-learn mailing list
>>>>>> scikit-learn at python.org
>>>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>>>
>>>>> --
>>>>>
>>>>> Guillaume Lemaitre
>>>>> Scikit-learn @ Inria Foundation
>>>>> https://glemaitre.github.io/
>>>
>>> --
>>>
>>> Guillaume Lemaitre
>>> Scikit-learn @ Inria Foundation
>>> https://glemaitre.github.io/
>
> --
>
> Guillaume Lemaitre
> Scikit-learn @ Inria Foundation
> https://glemaitre.github.io/
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Untitled.png
Type: image/png
Size: 493231 bytes
Desc: not available
URL:
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Screenshot 2023-10-31 at 14-39-06 4-gbm-local - Jupyter Notebook.png
Type: image/png
Size: 39833 bytes
Desc: not available
URL: