[scikit-learn] ROC convex hulls design question

Sat Oct 2 11:25:25 EDT 2021

Dear sklearn mailing list,

I love all the wonderful ways scikit-learn has made good practices in ML more accessible to so many! Thanks for all of that!

I’m wondering if there is there a design reason the default behavior for ROC generation (https://scikit-learn.org/stable/modules/generated/sklearn.metrics.roc_curve.html) doesn’t return the convex hull of the ROC?

In the default ROC computation, the resulting ROCs aren’t on their convex hulls (https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.ConvexHull.html) even though points on the convex hulls are achievable performance. So the default ROCs returned are suboptimal. That’s a point made in Tom Fawcett’s ROC 101 paper (https://www.math.ucdavis.edu/~saito/data/roc/fawcett-roc.pdf) that was cited in the sklearn docs.

He writes: “More generally, a classifier is potentially optimal if and only if it lies on the convex hull of the set of points in ROC space. The convex hull of the set of points in ROC space is called the ROC convex hull (ROCCH) of the corresponding set of classifiers.”

Apologies if this is already answered somewhere else… I searched and could only find this apparently abandoned repo: https://github.com/tfawcett/pycost

I’ve implemented an ROC convex hull myself and have found significant performance estimate improvements just from using the convex hull and am wondering if there was some reason this wasn’t implemented as the default.

Thanks,
-johnk-

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20211002/3bc359b4/attachment.html>