what is "value" in the nodes of trees in a gbm?
Hello everyone, I am trying to interpret the outputs of gradient boosting machines sample per sample. What does the "value" in each node of each tree in a gbm regressor mean? [Untitled.png] In random forests, value is the mean target value of the observations seen at that node. At the top node it is usually the mean target value of the train set (or bootstrapped sample). As it goes down the leaves it is the mean target value of the samples at each child. But in gradient boosting machines it is different. And I can't decipher how it is calculated. I expected the value in the first tree at the top node to be zero, because the residuals of the first tree are zero. But it is not exactly zero. In summary, how is the value at each node / tree calculated? Thanks a lot!!! Warm regards, Sole Sent with [Proton Mail](https://proton.me/) secure email.
The node values in GBDTs are an aggregation (typically a regularized average) of the *gradients *of the samples in that node. Each sample (x, y) is associated with a gradient computed as grad = d_loss(pred(x), y) / d_pred(x). These gradients are in the same physical dimension as the target (for regression). Some resources that may help: - https://explained.ai/gradient-boosting/descent.html - https://nicolas-hug.com/blog/gradient_boosting_descent (self plug) Nicolas On 30/10/2023 16:09, Sole Galli via scikit-learn wrote:
Hello everyone,
I am trying to interpret the outputs of gradient boosting machines sample per sample.
What does the "value" in each node of each tree in a gbm regressor mean?
Untitled.png
In random forests, value is the mean target value of the observations seen at that node. At the top node it is usually the mean target value of the train set (or bootstrapped sample). As it goes down the leaves it is the mean target value of the samples at each child.
But in gradient boosting machines it is different. And I can't decipher how it is calculated.
I expected the value in the first tree at the top node to be zero, because the residuals of the first tree are zero. But it is not exactly zero.
In summary, *how is the value at each node / tree calculated?*
Thanks a lot!!!
Warm regards, Sole
Sent with Proton Mail <https://proton.me/> secure email.
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Hi Nicolas, Thank you so much for the links and explanation. I really appreciate it. I am struggling to reproduce the results though. There's probably something I don't understand. This is an image of the top node, of the first tree in the ensemble (GradientBoostingRegressor): [Screenshot 2023-10-31 at 14-39-06 4-gbm-local - Jupyter Notebook.png] How can I manually obtain the values for squared_error and value? I thought square_error would be: np.mean( (y_train - 0.1 * gbm.estimators_[0][0].predict(X_train))**2) And value would be: -2 * (y_train - 0.1 * gbm.estimators_[0][0].predict(X_train)) But those calculations do not return the numbers shown in the node. Is there something obvious that I am doing wrong? Thanks a lot! Best Sole Sent with [Proton Mail](https://proton.me/) secure email. ------- Original Message ------- On Monday, October 30th, 2023 at 5:34 PM, Nicolas Hug <niourf@gmail.com> wrote:
The node values in GBDTs are an aggregation (typically a regularized average) of the gradients of the samples in that node.
Each sample (x, y) is associated with a gradient computed as grad = d_loss(pred(x), y) / d_pred(x). These gradients are in the same physical dimension as the target (for regression). Some resources that may help:
- https://explained.ai/gradient-boosting/descent.html - https://nicolas-hug.com/blog/gradient_boosting_descent (self plug)
Nicolas
On 30/10/2023 16:09, Sole Galli via scikit-learn wrote:
Hello everyone,
I am trying to interpret the outputs of gradient boosting machines sample per sample.
What does the "value" in each node of each tree in a gbm regressor mean?
[Untitled.png]
In random forests, value is the mean target value of the observations seen at that node. At the top node it is usually the mean target value of the train set (or bootstrapped sample). As it goes down the leaves it is the mean target value of the samples at each child.
But in gradient boosting machines it is different. And I can't decipher how it is calculated.
I expected the value in the first tree at the top node to be zero, because the residuals of the first tree are zero. But it is not exactly zero.
In summary, how is the value at each node / tree calculated?
Thanks a lot!!!
Warm regards, Sole
Sent with [Proton Mail](https://proton.me/) secure email.
_______________________________________________ scikit-learn mailing list scikit-learn@python.org
You probably want to look at the following example section: https://scikit-learn.org/stable/auto_examples/ensemble/plot_gradient_boostin... On Tue, 31 Oct 2023 at 14:52, Sole Galli via scikit-learn < scikit-learn@python.org> wrote:
Hi Nicolas,
Thank you so much for the links and explanation. I really appreciate it.
I am struggling to reproduce the results though. There's probably something I don't understand.
This is an image of the top node, of the first tree in the ensemble (GradientBoostingRegressor):
[image: Screenshot 2023-10-31 at 14-39-06 4-gbm-local - Jupyter Notebook.png]
How can I manually obtain the values for squared_error and value?
I thought square_error would be:
np.mean( (y_train - 0.1 * gbm.estimators_[0][0].predict(X_train))**2)
And value would be:
-2 * (y_train - 0.1 * gbm.estimators_[0][0].predict(X_train))
But those calculations do not return the numbers shown in the node.
Is there something obvious that I am doing wrong?
Thanks a lot!
Best Sole
Sent with Proton Mail <https://proton.me/> secure email.
------- Original Message ------- On Monday, October 30th, 2023 at 5:34 PM, Nicolas Hug <niourf@gmail.com> wrote:
The node values in GBDTs are an aggregation (typically a regularized average) of the *gradients *of the samples in that node.
Each sample (x, y) is associated with a gradient computed as grad = d_loss(pred(x), y) / d_pred(x). These gradients are in the same physical dimension as the target (for regression). Some resources that may help:
- https://explained.ai/gradient-boosting/descent.html - https://nicolas-hug.com/blog/gradient_boosting_descent (self plug) Nicolas
On 30/10/2023 16:09, Sole Galli via scikit-learn wrote:
Hello everyone,
I am trying to interpret the outputs of gradient boosting machines sample per sample.
What does the "value" in each node of each tree in a gbm regressor mean?
[image: Untitled.png]
In random forests, value is the mean target value of the observations seen at that node. At the top node it is usually the mean target value of the train set (or bootstrapped sample). As it goes down the leaves it is the mean target value of the samples at each child.
But in gradient boosting machines it is different. And I can't decipher how it is calculated.
I expected the value in the first tree at the top node to be zero, because the residuals of the first tree are zero. But it is not exactly zero.
In summary, *how is the value at each node / tree calculated?*
Thanks a lot!!!
Warm regards, Sole
Sent with Proton Mail <https://proton.me/> secure email.
_______________________________________________ scikit-learn mailing listscikit-learn@python.orghttps://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
-- Guillaume Lemaitre Scikit-learn @ Inria Foundation https://glemaitre.github.io/
Hello again, I sorted things out in my head and I got at least a few numbers to match: In the first tree, we are fitting against y_train, hence: ``` # first tree, top node, square error np.mean((y_train - y_train.mean())**2) ``` returns as expected: 1.3308271915828764 And the value, I assume is given by: ``` np.mean(-2 * (y_train - y_train.mean())) ``` Is this correct? The second tree is fit against the residuals. Hence: ``` # second tree, top node, square error residuals = y_train - 0.1 * gbm.estimators_[0][0].predict(X_train)np.mean((residuals - np.mean(residuals))**2) ``` returns the expected result, but now the value is not what I expect: ``` np.mean(-2 * (residuals - np.mean(residuals))) ``` So i guess that is not how value in the top node of the second tree is calculated? Thank you! Sent with [Proton Mail](https://proton.me/) secure email. ------- Original Message ------- On Tuesday, October 31st, 2023 at 3:51 PM, Guillaume Lemaître <g.lemaitre58@gmail.com> wrote:
You probably want to look at the following example section:
https://scikit-learn.org/stable/auto_examples/ensemble/plot_gradient_boostin...
On Tue, 31 Oct 2023 at 14:52, Sole Galli via scikit-learn <scikit-learn@python.org> wrote:
Hi Nicolas,
Thank you so much for the links and explanation. I really appreciate it.
I am struggling to reproduce the results though. There's probably something I don't understand.
This is an image of the top node, of the first tree in the ensemble (GradientBoostingRegressor):
[Screenshot 2023-10-31 at 14-39-06 4-gbm-local - Jupyter Notebook.png]
How can I manually obtain the values for squared_error and value?
I thought square_error would be:
np.mean( (y_train - 0.1 * gbm.estimators_[0][0].predict(X_train))**2)
And value would be:
-2 * (y_train - 0.1 * gbm.estimators_[0][0].predict(X_train))
But those calculations do not return the numbers shown in the node.
Is there something obvious that I am doing wrong?
Thanks a lot!
Best Sole
Sent with [Proton Mail](https://proton.me/) secure email.
------- Original Message ------- On Monday, October 30th, 2023 at 5:34 PM, Nicolas Hug <niourf@gmail.com> wrote:
The node values in GBDTs are an aggregation (typically a regularized average) of the gradients of the samples in that node.
Each sample (x, y) is associated with a gradient computed as grad = d_loss(pred(x), y) / d_pred(x). These gradients are in the same physical dimension as the target (for regression). Some resources that may help:
- https://explained.ai/gradient-boosting/descent.html - https://nicolas-hug.com/blog/gradient_boosting_descent (self plug)
Nicolas
On 30/10/2023 16:09, Sole Galli via scikit-learn wrote:
Hello everyone,
I am trying to interpret the outputs of gradient boosting machines sample per sample.
What does the "value" in each node of each tree in a gbm regressor mean?
[Untitled.png]
In random forests, value is the mean target value of the observations seen at that node. At the top node it is usually the mean target value of the train set (or bootstrapped sample). As it goes down the leaves it is the mean target value of the samples at each child.
But in gradient boosting machines it is different. And I can't decipher how it is calculated.
I expected the value in the first tree at the top node to be zero, because the residuals of the first tree are zero. But it is not exactly zero.
In summary, how is the value at each node / tree calculated?
Thanks a lot!!!
Warm regards, Sole
Sent with [Proton Mail](https://proton.me/) secure email.
_______________________________________________ scikit-learn mailing list scikit-learn@python.org
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
--
Guillaume Lemaitre Scikit-learn @ Inria Foundation https://glemaitre.github.io/
The values are always computed in the following manner: cdef void node_value(self, float64_t* dest) noexcept nogil: """Compute the node value of sample_indices[start:end] into dest.""" cdef intp_t k for k in range(self.n_outputs): dest[k] = self.sum_total[k] / self.weighted_n_node_samples On Tue, 31 Oct 2023 at 16:42, Sole Galli <solegalli@protonmail.com> wrote:
Hello again,
I sorted things out in my head and I got at least a few numbers to match:
In the first tree, we are fitting against y_train, hence:
``` # first tree, top node, square error np.mean((y_train - y_train.mean())**2) ```
returns as expected:
1.3308271915828764
And the value, I assume is given by:
``` np.mean(-2 * (y_train - y_train.mean())) ```
Is this correct?
The second tree is fit against the residuals. Hence:
``` # second tree, top node, square error residuals = y_train - 0.1 * gbm.estimators_[0][0].predict(X_train) np.mean((residuals - np.mean(residuals))**2) ```
returns the expected result, but now the value is not what I expect:
``` np.mean(-2 * (residuals - np.mean(residuals))) ```
So i guess that is not how value in the top node of the second tree is calculated?
Thank you!
Sent with Proton Mail <https://proton.me/> secure email.
------- Original Message ------- On Tuesday, October 31st, 2023 at 3:51 PM, Guillaume Lemaître < g.lemaitre58@gmail.com> wrote:
You probably want to look at the following example section:
https://scikit-learn.org/stable/auto_examples/ensemble/plot_gradient_boostin...
On Tue, 31 Oct 2023 at 14:52, Sole Galli via scikit-learn < scikit-learn@python.org> wrote:
Hi Nicolas,
Thank you so much for the links and explanation. I really appreciate it.
I am struggling to reproduce the results though. There's probably something I don't understand.
This is an image of the top node, of the first tree in the ensemble (GradientBoostingRegressor):
[image: Screenshot 2023-10-31 at 14-39-06 4-gbm-local - Jupyter Notebook.png]
How can I manually obtain the values for squared_error and value?
I thought square_error would be:
np.mean( (y_train - 0.1 * gbm.estimators_[0][0].predict(X_train))**2)
And value would be:
-2 * (y_train - 0.1 * gbm.estimators_[0][0].predict(X_train))
But those calculations do not return the numbers shown in the node.
Is there something obvious that I am doing wrong?
Thanks a lot!
Best Sole
Sent with Proton Mail <https://proton.me/> secure email.
------- Original Message ------- On Monday, October 30th, 2023 at 5:34 PM, Nicolas Hug <niourf@gmail.com> wrote:
The node values in GBDTs are an aggregation (typically a regularized average) of the *gradients *of the samples in that node.
Each sample (x, y) is associated with a gradient computed as grad = d_loss(pred(x), y) / d_pred(x). These gradients are in the same physical dimension as the target (for regression). Some resources that may help:
- https://explained.ai/gradient-boosting/descent.html - https://nicolas-hug.com/blog/gradient_boosting_descent (self plug) Nicolas
On 30/10/2023 16:09, Sole Galli via scikit-learn wrote:
Hello everyone,
I am trying to interpret the outputs of gradient boosting machines sample per sample.
What does the "value" in each node of each tree in a gbm regressor mean?
[image: Untitled.png]
In random forests, value is the mean target value of the observations seen at that node. At the top node it is usually the mean target value of the train set (or bootstrapped sample). As it goes down the leaves it is the mean target value of the samples at each child.
But in gradient boosting machines it is different. And I can't decipher how it is calculated.
I expected the value in the first tree at the top node to be zero, because the residuals of the first tree are zero. But it is not exactly zero.
In summary, *how is the value at each node / tree calculated?*
Thanks a lot!!!
Warm regards, Sole
Sent with Proton Mail <https://proton.me/> secure email.
_______________________________________________ scikit-learn mailing listscikit-learn@python.orghttps://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
-- Guillaume Lemaitre Scikit-learn @ Inria Foundation https://glemaitre.github.io/
-- Guillaume Lemaitre Scikit-learn @ Inria Foundation https://glemaitre.github.io/
Thank you Guillaume, I dag a bit and found that sum_total is the sum of some weighted y: ``` for k in range(self.n_outputs): y_ik = self.y[i, k] w_y_ik = w * y_ik self.sum_total[k] += w_y_ik ``` Is w the learning rate? and what is y exactly? Thank you! Best wishes Sole Sent with [Proton Mail](https://proton.me/) secure email. ------- Original Message ------- On Tuesday, October 31st, 2023 at 4:55 PM, Guillaume Lemaître <g.lemaitre58@gmail.com> wrote:
The values are always computed in the following manner:
cdef void node_value(self, float64_t* dest) noexcept nogil: """Compute the node value of sample_indices[start:end] into dest.""" cdef intp_t k
for k in range(self.n_outputs): dest[k] = self.sum_total[k] / self.weighted_n_node_samples
On Tue, 31 Oct 2023 at 16:42, Sole Galli <solegalli@protonmail.com> wrote:
Hello again,
I sorted things out in my head and I got at least a few numbers to match:
In the first tree, we are fitting against y_train, hence:
``` # first tree, top node, square error
np.mean((y_train - y_train.mean())**2) ```
returns as expected:
1.3308271915828764
And the value, I assume is given by:
``` np.mean(-2 * (y_train - y_train.mean()))
```
Is this correct?
The second tree is fit against the residuals. Hence:
``` # second tree, top node, square error residuals = y_train - 0.1 * gbm.estimators_[0][0].predict(X_train)np.mean((residuals - np.mean(residuals))**2)
```
returns the expected result, but now the value is not what I expect:
``` np.mean(-2 * (residuals - np.mean(residuals)))
```
So i guess that is not how value in the top node of the second tree is calculated?
Thank you!
Sent with [Proton Mail](https://proton.me/) secure email.
------- Original Message ------- On Tuesday, October 31st, 2023 at 3:51 PM, Guillaume Lemaître <g.lemaitre58@gmail.com> wrote:
You probably want to look at the following example section:
https://scikit-learn.org/stable/auto_examples/ensemble/plot_gradient_boostin...
On Tue, 31 Oct 2023 at 14:52, Sole Galli via scikit-learn <scikit-learn@python.org> wrote:
Hi Nicolas,
Thank you so much for the links and explanation. I really appreciate it.
I am struggling to reproduce the results though. There's probably something I don't understand.
This is an image of the top node, of the first tree in the ensemble (GradientBoostingRegressor):
[Screenshot 2023-10-31 at 14-39-06 4-gbm-local - Jupyter Notebook.png]
How can I manually obtain the values for squared_error and value?
I thought square_error would be:
np.mean( (y_train - 0.1 * gbm.estimators_[0][0].predict(X_train))**2)
And value would be:
-2 * (y_train - 0.1 * gbm.estimators_[0][0].predict(X_train))
But those calculations do not return the numbers shown in the node.
Is there something obvious that I am doing wrong?
Thanks a lot!
Best Sole
Sent with [Proton Mail](https://proton.me/) secure email.
------- Original Message ------- On Monday, October 30th, 2023 at 5:34 PM, Nicolas Hug <niourf@gmail.com> wrote:
The node values in GBDTs are an aggregation (typically a regularized average) of the gradients of the samples in that node.
Each sample (x, y) is associated with a gradient computed as grad = d_loss(pred(x), y) / d_pred(x). These gradients are in the same physical dimension as the target (for regression). Some resources that may help:
- https://explained.ai/gradient-boosting/descent.html - https://nicolas-hug.com/blog/gradient_boosting_descent (self plug)
Nicolas
On 30/10/2023 16:09, Sole Galli via scikit-learn wrote:
Hello everyone,
I am trying to interpret the outputs of gradient boosting machines sample per sample.
What does the "value" in each node of each tree in a gbm regressor mean?
[Untitled.png]
In random forests, value is the mean target value of the observations seen at that node. At the top node it is usually the mean target value of the train set (or bootstrapped sample). As it goes down the leaves it is the mean target value of the samples at each child.
But in gradient boosting machines it is different. And I can't decipher how it is calculated.
I expected the value in the first tree at the top node to be zero, because the residuals of the first tree are zero. But it is not exactly zero.
In summary, how is the value at each node / tree calculated?
Thanks a lot!!!
Warm regards, Sole
Sent with [Proton Mail](https://proton.me/) secure email.
_______________________________________________ scikit-learn mailing list scikit-learn@python.org
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
--
Guillaume Lemaitre Scikit-learn @ Inria Foundation https://glemaitre.github.io/
--
Guillaume Lemaitre Scikit-learn @ Inria Foundation https://glemaitre.github.io/
The learning rate is not part of the tree. This is part of the boosting loop where you update the raw predictions (residuals) of the next step: https://github.com/scikit-learn/scikit-learn/blob/main/sklearn/ensemble/_gb.... On Wed, 1 Nov 2023 at 11:58, Sole Galli <solegalli@protonmail.com> wrote:
Thank you Guillaume,
I dag a bit and found that sum_total is the sum of some weighted y:
``` for k in range(self.n_outputs): y_ik = self.y[i, k] w_y_ik = w * y_ik self.sum_total[k] += w_y_ik ```
Is w the learning rate? and what is y exactly?
Thank you!
Best wishes Sole
Sent with Proton Mail <https://proton.me/> secure email.
------- Original Message ------- On Tuesday, October 31st, 2023 at 4:55 PM, Guillaume Lemaître < g.lemaitre58@gmail.com> wrote:
The values are always computed in the following manner:
cdef void node_value(self, float64_t* dest) noexcept nogil: """Compute the node value of sample_indices[start:end] into dest.""" cdef intp_t k
for k in range(self.n_outputs): dest[k] = self.sum_total[k] / self.weighted_n_node_samples
On Tue, 31 Oct 2023 at 16:42, Sole Galli <solegalli@protonmail.com> wrote:
Hello again,
I sorted things out in my head and I got at least a few numbers to match:
In the first tree, we are fitting against y_train, hence:
``` # first tree, top node, square error np.mean((y_train - y_train.mean())**2) ```
returns as expected:
1.3308271915828764
And the value, I assume is given by:
``` np.mean(-2 * (y_train - y_train.mean())) ```
Is this correct?
The second tree is fit against the residuals. Hence:
``` # second tree, top node, square error residuals = y_train - 0.1 * gbm.estimators_[0][0].predict(X_train) np.mean((residuals - np.mean(residuals))**2) ```
returns the expected result, but now the value is not what I expect:
``` np.mean(-2 * (residuals - np.mean(residuals))) ```
So i guess that is not how value in the top node of the second tree is calculated?
Thank you!
Sent with Proton Mail <https://proton.me/> secure email.
------- Original Message ------- On Tuesday, October 31st, 2023 at 3:51 PM, Guillaume Lemaître < g.lemaitre58@gmail.com> wrote:
You probably want to look at the following example section:
https://scikit-learn.org/stable/auto_examples/ensemble/plot_gradient_boostin...
On Tue, 31 Oct 2023 at 14:52, Sole Galli via scikit-learn < scikit-learn@python.org> wrote:
Hi Nicolas,
Thank you so much for the links and explanation. I really appreciate it.
I am struggling to reproduce the results though. There's probably something I don't understand.
This is an image of the top node, of the first tree in the ensemble (GradientBoostingRegressor):
[image: Screenshot 2023-10-31 at 14-39-06 4-gbm-local - Jupyter Notebook.png]
How can I manually obtain the values for squared_error and value?
I thought square_error would be:
np.mean( (y_train - 0.1 * gbm.estimators_[0][0].predict(X_train))**2)
And value would be:
-2 * (y_train - 0.1 * gbm.estimators_[0][0].predict(X_train))
But those calculations do not return the numbers shown in the node.
Is there something obvious that I am doing wrong?
Thanks a lot!
Best Sole
Sent with Proton Mail <https://proton.me/> secure email.
------- Original Message ------- On Monday, October 30th, 2023 at 5:34 PM, Nicolas Hug <niourf@gmail.com> wrote:
The node values in GBDTs are an aggregation (typically a regularized average) of the *gradients *of the samples in that node.
Each sample (x, y) is associated with a gradient computed as grad = d_loss(pred(x), y) / d_pred(x). These gradients are in the same physical dimension as the target (for regression). Some resources that may help:
- https://explained.ai/gradient-boosting/descent.html - https://nicolas-hug.com/blog/gradient_boosting_descent (self plug) Nicolas
On 30/10/2023 16:09, Sole Galli via scikit-learn wrote:
Hello everyone,
I am trying to interpret the outputs of gradient boosting machines sample per sample.
What does the "value" in each node of each tree in a gbm regressor mean?
[image: Untitled.png]
In random forests, value is the mean target value of the observations seen at that node. At the top node it is usually the mean target value of the train set (or bootstrapped sample). As it goes down the leaves it is the mean target value of the samples at each child.
But in gradient boosting machines it is different. And I can't decipher how it is calculated.
I expected the value in the first tree at the top node to be zero, because the residuals of the first tree are zero. But it is not exactly zero.
In summary, *how is the value at each node / tree calculated?*
Thanks a lot!!!
Warm regards, Sole
Sent with Proton Mail <https://proton.me/> secure email.
_______________________________________________ scikit-learn mailing listscikit-learn@python.orghttps://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
-- Guillaume Lemaitre Scikit-learn @ Inria Foundation https://glemaitre.github.io/
-- Guillaume Lemaitre Scikit-learn @ Inria Foundation https://glemaitre.github.io/
-- Guillaume Lemaitre Scikit-learn @ Inria Foundation https://glemaitre.github.io/
Thanks for your email again Guillaume. I am still looping on this, I left a question on stackoverflow: https://stackoverflow.com/questions/77396735/how-to-calculate-the-values-at-... Cheers Sole Sent with [Proton Mail](https://proton.me/) secure email. ------- Original Message ------- On Wednesday, November 1st, 2023 at 12:04 PM, Guillaume Lemaître <g.lemaitre58@gmail.com> wrote:
The learning rate is not part of the tree. This is part of the boosting loop where you update the raw predictions (residuals) of the next step:
https://github.com/scikit-learn/scikit-learn/blob/main/sklearn/ensemble/_gb....
On Wed, 1 Nov 2023 at 11:58, Sole Galli <solegalli@protonmail.com> wrote:
Thank you Guillaume,
I dag a bit and found that sum_total is the sum of some weighted y:
``` for k in range(self.n_outputs): y_ik = self.y[i, k] w_y_ik = w * y_ik self.sum_total[k] += w_y_ik ```
Is w the learning rate? and what is y exactly?
Thank you!
Best wishes Sole
Sent with [Proton Mail](https://proton.me/) secure email.
------- Original Message ------- On Tuesday, October 31st, 2023 at 4:55 PM, Guillaume Lemaître <g.lemaitre58@gmail.com> wrote:
The values are always computed in the following manner:
cdef void node_value(self, float64_t* dest) noexcept nogil: """Compute the node value of sample_indices[start:end] into dest.""" cdef intp_t k
for k in range(self.n_outputs): dest[k] = self.sum_total[k] / self.weighted_n_node_samples
On Tue, 31 Oct 2023 at 16:42, Sole Galli <solegalli@protonmail.com> wrote:
Hello again,
I sorted things out in my head and I got at least a few numbers to match:
In the first tree, we are fitting against y_train, hence:
``` # first tree, top node, square error
np.mean((y_train - y_train.mean())**2) ```
returns as expected:
1.3308271915828764
And the value, I assume is given by:
``` np.mean(-2 * (y_train - y_train.mean()))
```
Is this correct?
The second tree is fit against the residuals. Hence:
``` # second tree, top node, square error residuals = y_train - 0.1 * gbm.estimators_[0][0].predict(X_train)np.mean((residuals - np.mean(residuals))**2)
```
returns the expected result, but now the value is not what I expect:
``` np.mean(-2 * (residuals - np.mean(residuals)))
```
So i guess that is not how value in the top node of the second tree is calculated?
Thank you!
Sent with [Proton Mail](https://proton.me/) secure email.
------- Original Message ------- On Tuesday, October 31st, 2023 at 3:51 PM, Guillaume Lemaître <g.lemaitre58@gmail.com> wrote:
You probably want to look at the following example section:
https://scikit-learn.org/stable/auto_examples/ensemble/plot_gradient_boostin...
On Tue, 31 Oct 2023 at 14:52, Sole Galli via scikit-learn <scikit-learn@python.org> wrote:
Hi Nicolas,
Thank you so much for the links and explanation. I really appreciate it.
I am struggling to reproduce the results though. There's probably something I don't understand.
This is an image of the top node, of the first tree in the ensemble (GradientBoostingRegressor):
[Screenshot 2023-10-31 at 14-39-06 4-gbm-local - Jupyter Notebook.png]
How can I manually obtain the values for squared_error and value?
I thought square_error would be:
np.mean( (y_train - 0.1 * gbm.estimators_[0][0].predict(X_train))**2)
And value would be:
-2 * (y_train - 0.1 * gbm.estimators_[0][0].predict(X_train))
But those calculations do not return the numbers shown in the node.
Is there something obvious that I am doing wrong?
Thanks a lot!
Best Sole
Sent with [Proton Mail](https://proton.me/) secure email.
------- Original Message ------- On Monday, October 30th, 2023 at 5:34 PM, Nicolas Hug <niourf@gmail.com> wrote:
> The node values in GBDTs are an aggregation (typically a regularized average) of the gradients of the samples in that node. > > Each sample (x, y) is associated with a gradient computed as grad = d_loss(pred(x), y) / d_pred(x). These gradients are in the same physical dimension as the target (for regression). Some resources that may help: > > - https://explained.ai/gradient-boosting/descent.html > - https://nicolas-hug.com/blog/gradient_boosting_descent (self plug) > > Nicolas > > On 30/10/2023 16:09, Sole Galli via scikit-learn wrote: > >> Hello everyone, >> >> I am trying to interpret the outputs of gradient boosting machines sample per sample. >> >> What does the "value" in each node of each tree in a gbm regressor mean? >> >> [Untitled.png] >> >> In random forests, value is the mean target value of the observations seen at that node. At the top node it is usually the mean target value of the train set (or bootstrapped sample). As it goes down the leaves it is the mean target value of the samples at each child. >> >> But in gradient boosting machines it is different. And I can't decipher how it is calculated. >> >> I expected the value in the first tree at the top node to be zero, because the residuals of the first tree are zero. But it is not exactly zero. >> >> In summary, how is the value at each node / tree calculated? >> >> Thanks a lot!!! >> >> Warm regards, >> Sole >> >> Sent with [Proton Mail](https://proton.me/) secure email. >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn@python.org >> >> https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
--
Guillaume Lemaitre Scikit-learn @ Inria Foundation https://glemaitre.github.io/
--
Guillaume Lemaitre Scikit-learn @ Inria Foundation https://glemaitre.github.io/
--
Guillaume Lemaitre Scikit-learn @ Inria Foundation https://glemaitre.github.io/
participants (3)
-
Guillaume Lemaître -
Nicolas Hug -
Sole Galli