[scikit-learn] Need for multioutput multivariate algorithm for Random Forest in Python (using Mahalanobis distance)
Paul Chike Ofoche
tchyk2001 at yahoo.com
Fri Feb 14 07:37:44 EST 2020
Scikit-learn random forest does not handle the multi-output case, but only maps to each output one at a time, thereby not accounting for the correlation between multi-outputs, which is what the Mahalanobis distance does. I, as well as other researchers have observed this issue for as much as two years. Could there be a solution to implement it in RandomForest, since Python already has a function that computes Mahalanobis distances?
On Thursday, February 13, 2020, 10:15:11 PM CST, Andreas Mueller <t3kcit at gmail.com> wrote:
On 2/9/20 12:21 PM, Paul Chike Ofoche via scikit-learn wrote:
Hello all,
My name is Paul and I am enthused about data science. I have been using Python and other programming languages for close to two years. There is an issue that I have been facing since I began applying Python to the analysis of my research work.
My question has remained unanswered for months. Has anybody not run into the need to work with data whereby the regression results are a multiple output, in which the output parameters are correlated with each other? This is called a multi-output multivariate problem. A version of random forest that handles multiple outputs is referred to as the multivariate random forest. It is implemented in the programming language, R (see attached reference documentation below).
The scikit-learn random forest actually handles this. It doesn't use the mahalanobis distance but that seems like a simple preprocessing step.
Till date, there exists no such package in Python. My question is whether anybody knows how to go about implementing this. The random forest univariate regression case utilizes the Euclidean distance as the measurement criteria, whereas the multivariate regression case uses the Mahalanobis distance, which takes into account the inter-relationships between the multiple outputs. I have inquired about an equivalent capability in Python for many years, but it has still not been addressed. Such a multivariate random forest mode is very applicable to the type of research and analysis that I do. Could someone help, please?
Thank you,
Paul Ofoche
PS: This is an important need for multivariate output analysis as a technique to solving practical research problems. Here are some posted questions by various other Python users concerning this same issue.
https://datascience.stackexchange.com/questions/21637/code-for-multivariate-random-forest-in-python-r
Multi-output regression
|
|
|
|
|
|
|
|
|
|
|
Multi-output regression
I have been looking in to Multi-output regression the last view weeks. I am working with the scikit learn packag...
|
|
|
_______________________________________________
scikit-learn mailing list
scikit-learn at python.org
https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________
scikit-learn mailing list
scikit-learn at python.org
https://mail.python.org/mailman/listinfo/scikit-learn
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20200214/596b68e0/attachment.html>
More information about the scikit-learn
mailing list