[scikit-learn] PyCM: Multiclass confusion matrix library in Python

Brown J.B. jbbrown at kuhp.kyoto-u.ac.jp
Mon Jun 4 11:56:22 EDT 2018


Hello community,

I wonder if there's something similar for the binary class case where,
>> the prediction is a real value (activation) and from this we can also
>> derive
>>   - CMs for all prediction cutoff (or set of cutoffs?)
>>   - scores over all cutoffs (AUC, AP, ...)
>>
> AUC and AP are by definition over all cut-offs. And CMs for all
> cutoffs doesn't seem a good idea, because that'll be n_samples many
> in the general case. If you want to specify a set of cutoffs, that would
> be pretty easy to do.
> How do you find these cut-offs, though?
>
>>
>> For me, in analyzing (binary class) performance, reporting scores for
>> a single cutoff is less useful than seeing how the many scores (tpr,
>> ppv, mcc, relative risk, chi^2, ...) vary at various false positive
>> rates, or prediction quantiles.
>>
>
In terms of finding cut-offs, one could use the idea of metric surfaces
that I recently proposed
https://onlinelibrary.wiley.com/doi/abs/10.1002/minf.201700127
and then plot your per-threshold TPR/TNR pairs on the PPV/MCC/etc surfaces
to determine what conditions you are willing to accept against the
background of your prediction problem.

I use these surfaces (a) to think about the prediction problem before any
attempt at modeling is made, and (b) to deconstruct results such as
"Accuracy=85%" into interpretations in the context of my field and the data
being predicted.

Hope this contributes a bit of food for thought.
J.B.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20180605/050ae30a/attachment.html>


More information about the scikit-learn mailing list