[scikit-learn] PyCM: Multiclass confusion matrix library in Python

Joel Nothman joel.nothman at gmail.com
Mon Jun 4 21:09:57 EDT 2018


>
> Thanks for this -- looks useful. I had to write something similar (for
>> the binary case) and wish scikit had something like this.
>
>
> Which part of it? I'm not entirely sure I understand what the core
> functionality is.
>
>
I think the core efficiently evaluating the full set of metrics appropriate
for the kind of task. We now support multi-metric scoring in things like
cross_validation and GridSearchCV (but not in other CV implementations
yet), but:

   1. it's not efficient (there are PRs in progress to work around this,
   but they are definitely work-arounds in the sense that we're still
   repeatedly calling metric functions rather than calculating sufficient
   statistics once), and
   2. we don't have a pre-defined set of scorers appropriate to binary
   classification; or for multiclass classification with 4 classes, one of
   which is the majority "no finding" class, etc.

But assuming we could solve or work around the first issue, having an
interface, in the core library or elsewhere which gave us a series of
appropriately-named scorers for different task types might be neat and
avoid code that a lot of people repeat:

def get_scorers_for_binary(pos_label, neg_label, proba_thresholds=(0.5,)):
    return {'precision:p>0.5': make_scorer(precision_score,
pos_label=pos_label),
            'accuracy:p>0.5': 'accuracy',
            'roc_auc': 'roc_auc',
            'log_loss': 'log_loss',
            ...
            }

def get_scorers_for_multiclass(pos_labels, neg_labels=()):
    out = {'accuracy': 'accuracy',
           'mcc': make_scorer(matthews_corrcoef),
           'cohen_kappa': make_scorer(cohen_kapppa_score),
           'precision_macro': make_scorer(precision_score,
labels=pos_labels, average='macro'),
           'precision_weighted': make_scorer(precision_score,
labels=pos_labels, average='weighted'),
           ...}
    if neg_labels:
        # micro-average precision is != accuracy only if some labels
are excluded
        out['precision_micro'] = make_scorer(precision_score,
labels=pos_labels, average='micro')
        ...
    return out


I note some risk of encouraging bad practice around multiple hypotheses,
etc... but generally I think this would be helpful to users.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20180605/5dcd7590/attachment-0001.html>


More information about the scikit-learn mailing list