top N accuracy classification metric
Hi all, It's common to use a top-n accuracy metric for multi-class classification problems, where for each observation the prediction is the set of probabilities for each of the classes, and a prediction is top-N accurate if the correct class is among the N highest predicted probability classes. I've written a simple implementation, but I don't think it quite fits the sklearn api. Specifically, _check_targets objects to the the continuous-multioutput format of the predictions for a classification task. Is there any interest in including a metric like this? I'd be happy to submit a pull request. Jeremiah
There are metrics with that kind of input in sklearn.metrics.ranking. I don't have the time to look them up now, but there have been proposals and PRs for similar ranking metrics. Please search the issue tracker for related issues. Thanks, Joel On 21 January 2017 at 06:16, Johnson, Jeremiah <Jeremiah.Johnson@unh.edu> wrote:
Hi all,
It’s common to use a top-n accuracy metric for multi-class classification problems, where for each observation the prediction is the set of probabilities for each of the classes, and a prediction is top-N accurate if the correct class is among the N highest predicted probability classes. I’ve written a simple implementation, but I don’t think it quite fits the sklearn api. Specifically, _check_targets objects to the the continuous-multioutput format of the predictions for a classification task. Is there any interest in including a metric like this? I’d be happy to submit a pull request.
Jeremiah
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Okay, I didn't see anything equivalent in the issue tracker, so submitted a pull request. Jeremiah =============================== Jeremiah W. Johnson, Ph. D Assistant Professor of Data Science Analytics Bachelor of Science Program Coordinator University of New Hampshire http://linkedin.com/jwjohnson314 ________________________________ From: scikit-learn <scikit-learn-bounces+jeremiah.johnson=unh.edu@python.org> on behalf of Joel Nothman <joel.nothman@gmail.com> Sent: Saturday, January 21, 2017 5:52 AM To: Scikit-learn user and developer mailing list Subject: Re: [scikit-learn] top N accuracy classification metric There are metrics with that kind of input in sklearn.metrics.ranking. I don't have the time to look them up now, but there have been proposals and PRs for similar ranking metrics. Please search the issue tracker for related issues. Thanks, Joel On 21 January 2017 at 06:16, Johnson, Jeremiah <Jeremiah.Johnson@unh.edu<mailto:Jeremiah.Johnson@unh.edu>> wrote: Hi all, It's common to use a top-n accuracy metric for multi-class classification problems, where for each observation the prediction is the set of probabilities for each of the classes, and a prediction is top-N accurate if the correct class is among the N highest predicted probability classes. I've written a simple implementation, but I don't think it quite fits the sklearn api. Specifically, _check_targets objects to the the continuous-multioutput format of the predictions for a classification task. Is there any interest in including a metric like this? I'd be happy to submit a pull request. Jeremiah _______________________________________________ scikit-learn mailing list scikit-learn@python.org<mailto:scikit-learn@python.org> https://mail.python.org/mailman/listinfo/scikit-learn<https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.python.org_mailman_listinfo_scikit-2Dlearn&d=DwMFaQ&c=c6MrceVCY5m5A_KAUkrdoA&r=hQNTLb4Jonm4n54VBW80WEzIAaqvTOcTEjhIkrRJWXo&m=3qvCQaOyx8sDfjACeJj0PSYXii1EE9f5SNgx8OqqcQ4&s=g6ryDsG0_xttmZORL3MQoAFwj01miH300Hy2cBmRgg8&e=>
participants (2)
-
Joel Nothman -
Johnson, Jeremiah