[scikit-learn] custom loss function in RandomForestRegressor

Thomas Evangelidis tevang3 at gmail.com
Thu Mar 1 08:27:14 EST 2018


Hi again,

I am currently revisiting this problem after familiarizing myself with
Cython and Scikit-Learn's code and I have a very important query:

Looking at the class MSE(RegressionCriterion), the node impurity is defined
as the variance of the target values Y on that node. The predictions X are
nowhere involved in the computations. This contradicts my notion of "loss
function", which quantifies the discrepancy between predicted and target
values. Am I looking at the wrong class or what I want to do is just not
feasible with Random Forests? For example, I would like to modify the
RandomForestRegressor code to minimize the Pearson's R between predicted
and target values.

I thank you in advance for any clarification.
Thomas



>
>> On 02/15/2018 01:28 PM, Guillaume Lemaitre wrote:
>>
>> Yes you are right pxd are the header and pyx the definition. You need to
>> write a class as MSE. Criterion is an abstract class or base class (I don't
>> have it under the eye)
>>
>> @Andy: if I recall the PR, we made the classes public to enable such
>> custom criterion. However, ‎it is not documented since we were not
>> officially supporting it. So this is an hidden feature. We could always
>> discuss to make this feature more visible and document it.
>>
>>
>>
>


-- 

======================================================================

Dr Thomas Evangelidis

Post-doctoral Researcher
CEITEC - Central European Institute of Technology
Masaryk University
Kamenice 5/A35/2S049,
62500 Brno, Czech Republic

email: tevang at pharm.uoa.gr

          tevang3 at gmail.com


website: https://sites.google.com/site/thomasevangelidishomepage/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20180301/6261b655/attachment.html>


More information about the scikit-learn mailing list