[scikit-learn] custom loss function in RandomForestRegressor
Sebastian Raschka
se.raschka at gmail.com
Thu Mar 1 10:03:45 EST 2018
Unfortunately (or maybe fortunately :)) no, maximizing variance reduction & minimizing MSE are just special cases :)
Best,
Sebastian
> On Mar 1, 2018, at 9:59 AM, Thomas Evangelidis <tevang3 at gmail.com> wrote:
>
> Does this generalize to any loss function? For example I also want to implement Kendall's tau correlation coefficient and a combination of R, tau and RMSE. :)
>
> On Mar 1, 2018 15:49, "Sebastian Raschka" <se.raschka at gmail.com> wrote:
> Hi, Thomas,
>
> as far as I know, it's all the same and doesn't matter, and you would get the same splits, since R^2 is just a rescaled MSE.
>
> Best,
> Sebastian
>
> > On Mar 1, 2018, at 9:39 AM, Thomas Evangelidis <tevang3 at gmail.com> wrote:
> >
> > Hi Sebastian,
> >
> > Going back to Pearson's R loss function, does this imply that I must add an abstract "init2" method to RegressionCriterion (that's where MSE class inherits from) where I will add the target values X as extra argument? And then the node impurity will be 1-R (the lowest the best)? What about the impurities of the left and right split? In MSE class they are (sum_i^n y_i)**2 where n is the number of samples in the respective split. It is not clear how this is related to variance in order to adapt it for my purpose.
> >
> > Best,
> > Thomas
> >
> >
> > On Mar 1, 2018 14:56, "Sebastian Raschka" <se.raschka at gmail.com> wrote:
> > Hi, Thomas,
> >
> > in regression trees, minimizing the variance among the target values is equivalent to minimizing the MSE between targets and predicted values. This is also called variance reduction: https://en.wikipedia.org/wiki/Decision_tree_learning#Variance_reduction
> >
> > Best,
> > Sebastian
> >
> > > On Mar 1, 2018, at 8:27 AM, Thomas Evangelidis <tevang3 at gmail.com> wrote:
> > >
> > >
> > > Hi again,
> > >
> > > I am currently revisiting this problem after familiarizing myself with Cython and Scikit-Learn's code and I have a very important query:
> > >
> > > Looking at the class MSE(RegressionCriterion), the node impurity is defined as the variance of the target values Y on that node. The predictions X are nowhere involved in the computations. This contradicts my notion of "loss function", which quantifies the discrepancy between predicted and target values. Am I looking at the wrong class or what I want to do is just not feasible with Random Forests? For example, I would like to modify the RandomForestRegressor code to minimize the Pearson's R between predicted and target values.
> > >
> > > I thank you in advance for any clarification.
> > > Thomas
> > >
> > >
> > >
> > >
> > > On 02/15/2018 01:28 PM, Guillaume Lemaitre wrote:
> > >> Yes you are right pxd are the header and pyx the definition. You need to write a class as MSE. Criterion is an abstract class or base class (I don't have it under the eye)
> > >>
> > >> @Andy: if I recall the PR, we made the classes public to enable such custom criterion. However, it is not documented since we were not officially supporting it. So this is an hidden feature. We could always discuss to make this feature more visible and document it.
> > >
> > >
> > >
> > >
> > >
> > > --
> > > ======================================================================
> > > Dr Thomas Evangelidis
> > > Post-doctoral Researcher
> > > CEITEC - Central European Institute of Technology
> > > Masaryk University
> > > Kamenice 5/A35/2S049,
> > > 62500 Brno, Czech Republic
> > >
> > > email: tevang at pharm.uoa.gr
> > > tevang3 at gmail.com
> > >
> > > website: https://sites.google.com/site/thomasevangelidishomepage/
> > >
> > >
> > > _______________________________________________
> > > scikit-learn mailing list
> > > scikit-learn at python.org
> > > https://mail.python.org/mailman/listinfo/scikit-learn
> >
> > _______________________________________________
> > scikit-learn mailing list
> > scikit-learn at python.org
> > https://mail.python.org/mailman/listinfo/scikit-learn
> >
> > _______________________________________________
> > scikit-learn mailing list
> > scikit-learn at python.org
> > https://mail.python.org/mailman/listinfo/scikit-learn
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
More information about the scikit-learn
mailing list