[scikit-learn] custom loss function in RandomForestRegressor
Thomas Evangelidis
tevang3 at gmail.com
Thu Mar 1 09:39:43 EST 2018
Hi Sebastian,
Going back to Pearson's R loss function, does this imply that I must add an
abstract "init2" method to RegressionCriterion (that's where MSE class
inherits from) where I will add the target values X as extra argument? And
then the node impurity will be 1-R (the lowest the best)? What about the
impurities of the left and right split? In MSE class they are (sum_i^n
y_i)**2 where n is the number of samples in the respective split. It is not
clear how this is related to variance in order to adapt it for my purpose.
On Mar 1, 2018 14:56, "Sebastian Raschka" <se.raschka at gmail.com> wrote:
Hi, Thomas,
in regression trees, minimizing the variance among the target values is
equivalent to minimizing the MSE between targets and predicted values. This
is also called variance reduction: https://en.wikipedia.org/wiki/
> On Mar 1, 2018, at 8:27 AM, Thomas Evangelidis <tevang3 at gmail.com> wrote:
> Hi again,
> I am currently revisiting this problem after familiarizing myself with
Cython and Scikit-Learn's code and I have a very important query:
> Looking at the class MSE(RegressionCriterion), the node impurity is
defined as the variance of the target values Y on that node. The
predictions X are nowhere involved in the computations. This contradicts my
notion of "loss function", which quantifies the discrepancy between
predicted and target values. Am I looking at the wrong class or what I want
to do is just not feasible with Random Forests? For example, I would like
to modify the RandomForestRegressor code to minimize the Pearson's R
between predicted and target values.
> I thank you in advance for any clarification.
> Thomas
> On 02/15/2018 01:28 PM, Guillaume Lemaitre wrote:
>> Yes you are right pxd are the header and pyx the definition. You need to
write a class as MSE. Criterion is an abstract class or base class (I don't
have it under the eye)
>> @Andy: if I recall the PR, we made the classes public to enable such
custom criterion. However, it is not documented since we were not
officially supporting it. So this is an hidden feature. We could always
discuss to make this feature more visible and document it.
> --
> ======================================================================
> Dr Thomas Evangelidis
> Post-doctoral Researcher
> CEITEC - Central European Institute of Technology
> Masaryk University
> Kamenice 5/A35/2S049,
> 62500 Brno, Czech Republic
> email: tevang at pharm.uoa.gr
> tevang3 at gmail.com
> website: https://sites.google.com/site/thomasevangelidishomepage/
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
scikit-learn mailing list
scikit-learn at python.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20180301/869d19a4/attachment-0001.html>
More information about the scikit-learn
mailing list