[scikit-learn] custom loss function in RandomForestRegressor

Sebastian Raschka se.raschka at gmail.com
Thu Mar 1 09:47:54 EST 2018


Hi, Thomas,

as far as I know, it's all the same and doesn't matter, and you would get the same splits, since R^2 is just a rescaled MSE. 

Best,
Sebastian

> On Mar 1, 2018, at 9:39 AM, Thomas Evangelidis <tevang3 at gmail.com> wrote:
> 
> Hi Sebastian, 
> 
> Going back to Pearson's R loss function, does this imply that I must add an abstract "init2" method to RegressionCriterion (that's where MSE class inherits from) where I will add the target values X as extra argument? And then the node impurity will be 1-R (the lowest the best)? What about the impurities of the left and right split? In MSE class they are (sum_i^n y_i)**2 where n is the number of samples in the respective split. It is not clear how this is related to variance in order to adapt it for my purpose. 
> 
> Best, 
> Thomas
> 
> 
> On Mar 1, 2018 14:56, "Sebastian Raschka" <se.raschka at gmail.com> wrote:
> Hi, Thomas,
> 
> in regression trees, minimizing the variance among the target values is equivalent to minimizing the MSE between targets and predicted values. This is also called variance reduction: https://en.wikipedia.org/wiki/Decision_tree_learning#Variance_reduction
> 
> Best,
> Sebastian
> 
> > On Mar 1, 2018, at 8:27 AM, Thomas Evangelidis <tevang3 at gmail.com> wrote:
> >
> >
> > Hi again,
> >
> > I am currently revisiting this problem after familiarizing myself with Cython and Scikit-Learn's code and I have a very important query:
> >
> > Looking at the class MSE(RegressionCriterion), the node impurity is defined as the variance of the target values Y on that node. The predictions X are nowhere involved in the computations. This contradicts my notion of "loss function", which quantifies the discrepancy between predicted and target values. Am I looking at the wrong class or what I want to do is just not feasible with Random Forests? For example, I would like to modify the RandomForestRegressor code to minimize the Pearson's R between predicted and target values.
> >
> > I thank you in advance for any clarification.
> > Thomas
> >
> >
> >
> >
> > On 02/15/2018 01:28 PM, Guillaume Lemaitre wrote:
> >> Yes you are right pxd are the header and pyx the definition. You need to write a class as MSE. Criterion is an abstract class or base class (I don't have it under the eye)
> >>
> >> @Andy: if I recall the PR, we made the classes public to enable such custom criterion. However, ‎it is not documented since we were not officially supporting it. So this is an hidden feature. We could always discuss to make this feature more visible and document it.
> >
> >
> >
> >
> >
> > --
> > ======================================================================
> > Dr Thomas Evangelidis
> > Post-doctoral Researcher
> > CEITEC - Central European Institute of Technology
> > Masaryk University
> > Kamenice 5/A35/2S049,
> > 62500 Brno, Czech Republic
> >
> > email: tevang at pharm.uoa.gr
> >               tevang3 at gmail.com
> >
> > website: https://sites.google.com/site/thomasevangelidishomepage/
> >
> >
> > _______________________________________________
> > scikit-learn mailing list
> > scikit-learn at python.org
> > https://mail.python.org/mailman/listinfo/scikit-learn
> 
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
> 
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn



More information about the scikit-learn mailing list