[scikit-learn] random forests and multil-class probability

Guillaume Lemaître g.lemaitre58 at gmail.com
Tue Jul 27 06:02:23 EDT 2021


As far that I remember, `precision_recall_curve` and `roc_curve` do not support multi class. They are design to work only with binary classification.
Then, we provide an example for precision-recall that shows one way to compute precision-recall curve via averaging: https://scikit-learn.org/stable/auto_examples/model_selection/plot_precision_recall.html#sphx-glr-auto-examples-model-selection-plot-precision-recall-py <https://scikit-learn.org/stable/auto_examples/model_selection/plot_precision_recall.html#sphx-glr-auto-examples-model-selection-plot-precision-recall-py>
--
Guillaume Lemaitre
Scikit-learn @ Inria Foundation
https://glemaitre.github.io/

> On 27 Jul 2021, at 11:42, Sole Galli via scikit-learn <scikit-learn at python.org> wrote:
> 
> Thank you!
> 
> So when in the multiclass document says that for the algorithms that support intrinsically multiclass, which are listed here <https://scikit-learn.org/stable/modules/multiclass.html>, when it says that they do not need to be wrapped by the OnevsRest, it means that there is no need, because they can indeed handle multi class, each one in their own way.
> 
> But, if I want to plot PR curves or ROC curves, then I do need to wrap them because those metrics are calculated as a 1 vs rest manner, and this is not how it is handled by the algos. Is my understanding correct?
> 
> Thank you!
> 
> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
> On Tuesday, July 27th, 2021 at 11:33 AM, Nicolas Hug <niourf at gmail.com> wrote:
>> To add to Guillaume's answer: the native multiclass support for forests/trees is described here: https://scikit-learn.org/stable/modules/tree.html#multi-output-problems <https://scikit-learn.org/stable/modules/tree.html#multi-output-problems>
>> It's not a one-vs-rest strategy and can be summed up as:
>> 
>> 
>>> Store n output values in leaves, instead of 1;
>>> 
>>> Use splitting criteria that compute the average reduction across all n outputs.
>>> 
>> 
>> 
>> Nicolas
>> 
>> On 27/07/2021 10:22, Guillaume Lemaître wrote:
>>>> On 27 Jul 2021, at 11:08, Sole Galli via scikit-learn <scikit-learn at python.org> <mailto:scikit-learn at python.org> wrote:
>>>> 
>>>> Hello community,
>>>> 
>>>> Do I understand correctly that Random Forests are trained as a 1 vs rest when the target has more than 2 classes? Say the target takes values 0, 1 and 2, then the model would train 3 estimators 1 per class under the hood?.
>>> Each decision tree of the forest is natively supporting multi class.
>>> 
>>>> The predict_proba output is an array with 3 columns, containing the probability of each class. If it is 1 vs rest. am I correct to assume that the sum of the probabilities for the 3 classes should not necessarily add up to 1? are they normalized? how is it done so that they do add up to 1?
>>> According to the above answer, the sum for each row of the array given by `predict_proba` will sum to 1.
>>> According to the documentation, the probabilities are computed as:
>>> 
>>> The predicted class probabilities of an input sample are computed as the mean predicted class probabilities of the trees in the forest. The class probability of a single tree is the fraction of samples of the same class in a leaf.
>>> 
>>>> Thank you
>>>> Sole
>>>> 
>>>> 
>>>> 
>>>> _______________________________________________
>>>> scikit-learn mailing list
>>>> scikit-learn at python.org <mailto:scikit-learn at python.org>
>>>> https://mail.python.org/mailman/listinfo/scikit-learn <https://mail.python.org/mailman/listinfo/scikit-learn>
>>> _______________________________________________
>>> scikit-learn mailing list
>>> scikit-learn at python.org <mailto:scikit-learn at python.org>
>>> https://mail.python.org/mailman/listinfo/scikit-learn <https://mail.python.org/mailman/listinfo/scikit-learn>
> 
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20210727/8b658118/attachment.html>


More information about the scikit-learn mailing list