<div dir="ltr"><div>Just a skeptical comment from a bystander.</div><div><br></div><div>I only skimmed parts of the article. My impression is that this does not apply (directly) to the regression setting.</div><div>AFAIU, they assume that all observations have the same propability.</div><div><br></div><div>To me it looks more like the literature on testing of or confidence intervals for a single proportion.</div><div><br></div><div>I might be wrong.</div><div><br></div><div>Josef</div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Feb 7, 2019 at 11:00 AM Andreas Mueller <<a href="mailto:t3kcit@gmail.com">t3kcit@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">The paper definitely looks interesting and the authors are certainly <br>

some giants in the field.<br>

But it is actually not widely cited (139 citations since 2005), and I've <br>

never seen it used.<br>

<br>

I don't know why that is, and looking at the citations there doesn't <br>

seem to be a lot of follow-up work.<br>

I think this would need more validation before getting into sklearn.<br>

<br>

Sebastian: This paper is distribution independent and doesn't need <br>

bootstrapping, so it looks indeed quite nice.<br>

<br>

<br>

On 2/6/19 1:19 PM, Sebastian Raschka wrote:<br>

> Hi Stuart,<br>

><br>

> I don't think so because there is no standard way to compute CI's. That goes for all performance measures (accuracy, precision, recall, etc.). Some people use simple binomial approximation intervals, some people prefer bootstrapping etc. And it also depends on the data you have. In large datasets, binomial approximation intervals may be sufficient and bootstrapping too expensive etc.<br>

><br>

> Thanks for sharing that paper btw, will have a look.<br>

><br>

> Best,<br>

> Sebastian<br>

><br>

><br>

>> On Feb 6, 2019, at 11:28 AM, Stuart Reynolds <<a href="mailto:stuart@stuartreynolds.net" target="_blank">stuart@stuartreynolds.net</a>> wrote:<br>

>><br>

>> <a href="https://papers.nips.cc/paper/2645-confidence-intervals-for-the-area-under-the-roc-curve.pdf" rel="noreferrer" target="_blank">https://papers.nips.cc/paper/2645-confidence-intervals-for-the-area-under-the-roc-curve.pdf</a><br>

>> Does scikit (or other Python libraries) provide functions to measure the confidence interval of AUROC scores? Same question also for mean average precision.<br>

>><br>

>> It seems like this should be a standard results reporting practice if a method is available.<br>

>><br>

>> - Stuart<br>

>> _______________________________________________<br>

>> scikit-learn mailing list<br>

>> <a href="mailto:scikit-learn@python.org" target="_blank">scikit-learn@python.org</a><br>

>> <a href="https://mail.python.org/mailman/listinfo/scikit-learn" rel="noreferrer" target="_blank">https://mail.python.org/mailman/listinfo/scikit-learn</a><br>

> _______________________________________________<br>

> scikit-learn mailing list<br>

> <a href="mailto:scikit-learn@python.org" target="_blank">scikit-learn@python.org</a><br>

> <a href="https://mail.python.org/mailman/listinfo/scikit-learn" rel="noreferrer" target="_blank">https://mail.python.org/mailman/listinfo/scikit-learn</a><br>

<br>

_______________________________________________<br>

scikit-learn mailing list<br>

<a href="mailto:scikit-learn@python.org" target="_blank">scikit-learn@python.org</a><br>

<a href="https://mail.python.org/mailman/listinfo/scikit-learn" rel="noreferrer" target="_blank">https://mail.python.org/mailman/listinfo/scikit-learn</a><br>

</blockquote></div></div>