[scikit-learn] Need for multioutput multivariate algorithm for Random Forest in Python (using Mahalanobis distance)

Nicolas Hug niourf at gmail.com
Fri Feb 14 07:58:18 EST 2020


Hi Paul,

The way multioutput is handled in decision trees (and thus in the 
forests) is described in 
https://scikit-learn.org/stable/modules/tree.html#multi-output-problems. 
As you can see, the correlation between the output values *is* taken 
into account.

Can you explain what you would like to modify there?

Nicolas

On 2/14/20 7:37 AM, Paul Chike Ofoche via scikit-learn wrote:
> Scikit-learn random forest does *not *handle the multi-output case, 
> but only maps to each output one at a time, thereby not accounting for 
> the correlation between multi-outputs, which is what the Mahalanobis 
> distance does. I, as well as other researchers have observed this 
> issue for as much as two years. Could there be a solution to implement 
> it in RandomForest, since Python already has a function that computes 
> Mahalanobis distances?
>
>
> On Thursday, February 13, 2020, 10:15:11 PM CST, Andreas Mueller 
> <t3kcit at gmail.com> wrote:
>
>
>
>
> On 2/9/20 12:21 PM, Paul Chike Ofoche via scikit-learn wrote:
>
> Hello all,
>
> My name is Paul and I am enthused about data science. I have been 
> using Python and other programming languages for close to two years. 
> There is an issue that I have been facing since I began applying 
> Python to the analysis of my research work.
>
>
> My question has remained unanswered for months. Has anybody not run 
> into the need to work with data whereby the regression results are a 
> multiple output, in which the output parameters are correlated with 
> each other? This is called a multi-output multivariate problem. A 
> version of random forest that handles multiple outputs is referred to 
> as the multivariate random forest. It is implemented in the 
> programming language, R (see attached reference documentation below).
>
> The scikit-learn random forest actually handles this. It doesn't use 
> the mahalanobis distance but that seems like a simple preprocessing step.
>
>>
>> Till date, there exists no such package in Python. My question is 
>> whether anybody knows how to go about implementing this. The random 
>> forest univariate regression case utilizes the Euclidean distance as 
>> the measurement criteria, whereas the multivariate regression case 
>> uses the Mahalanobis distance, which takes into account the 
>> inter-relationships between the multiple outputs. I have inquired 
>> about an equivalent capability in Python for many years, but it has 
>> still not been addressed. Such a multivariate random forest mode is 
>> very applicable to the type of research and analysis that I do. Could 
>> someone help, please?
>>
>> Thank you,
>>
>> Paul Ofoche
>>
>> PS: This is an important need for multivariate output analysis as a 
>> technique to solving practical research problems. Here are some 
>> posted questions by various other Python users concerning this same 
>> issue.
>>
>> *https://datascience.stackexchange.com/questions/21637/code-for-multivariate-random-forest-in-python-r*
>>
>> Multi-output regression 
>> <https://stackoverflow.com/questions/49391637/multi-output-regression>
>>
>>
>>
>> 	
>>
>>
>> 	
>>
>>
>>     Multi-output regression
>>
>> I have been looking in to Multi-output regression the last view 
>> weeks. I am working with the scikit learn packag...
>>
>> <https://stackoverflow.com/questions/49391637/multi-output-regression>
>>
>>
>>
>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org  <mailto:scikit-learn at python.org>
>> https://mail.python.org/mailman/listinfo/scikit-learn
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org <mailto:scikit-learn at python.org>
> https://mail.python.org/mailman/listinfo/scikit-learn
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20200214/9b60e1f9/attachment-0001.html>


More information about the scikit-learn mailing list