[scikit-learn] Need for multioutput multivariate algorithm for Random Forest in Python (using Mahalanobis distance)

Andreas Mueller t3kcit at gmail.com
Thu Feb 13 23:13:37 EST 2020



On 2/9/20 12:21 PM, Paul Chike Ofoche via scikit-learn wrote:
>
> Hello all,
>
> My name is Paul and I am enthused about data science. I have been 
> using Python and other programming languages for close to two years. 
> There is an issue that I have been facing since I began applying 
> Python to the analysis of my research work.
>
>
> My question has remained unanswered for months. Has anybody not run 
> into the need to work with data whereby the regression results are a 
> multiple output, in which the output parameters are correlated with 
> each other? This is called a multi-output multivariate problem. A 
> version of random forest that handles multiple outputs is referred to 
> as the multivariate random forest. It is implemented in the 
> programming language, R (see attached reference documentation below).
>
The scikit-learn random forest actually handles this. It doesn't use the 
mahalanobis distance but that seems like a simple preprocessing step.
>
>
> Till date, there exists no such package in Python. My question is 
> whether anybody knows how to go about implementing this. The random 
> forest univariate regression case utilizes the Euclidean distance as 
> the measurement criteria, whereas the multivariate regression case 
> uses the Mahalanobis distance, which takes into account the 
> inter-relationships between the multiple outputs. I have inquired 
> about an equivalent capability in Python for many years, but it has 
> still not been addressed. Such a multivariate random forest mode is 
> very applicable to the type of research and analysis that I do. Could 
> someone help, please?
>
> Thank you,
>
> Paul Ofoche
>
> PS: This is an important need for multivariate output analysis as a 
> technique to solving practical research problems. Here are some posted 
> questions by various other Python users concerning this same issue.
>
> *https://datascience.stackexchange.com/questions/21637/code-for-multivariate-random-forest-in-python-r*
>
> Multi-output regression 
> <https://stackoverflow.com/questions/49391637/multi-output-regression>
>
>
>
> 	
>
>
> 	
>
>
>     Multi-output regression
>
> I have been looking in to Multi-output regression the last view weeks. 
> I am working with the scikit learn packag...
>
> <https://stackoverflow.com/questions/49391637/multi-output-regression>
>
>
>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20200213/8f505efb/attachment-0001.html>


More information about the scikit-learn mailing list