[scikit-learn] Can I evaluate clustering efficiency incrementally?

Uri Goren ugoren at gmail.com
Fri May 3 07:27:21 EDT 2019


I usually use clustering to save costs on labelling.
I like to apply hierarchical clustering, and then label a small sample and
fine-tune the clustering algorithm.

That way, you can evaluate the effectiveness in terms of cluster purity
(how many clusters contain mixed labels)

See example with sklearn here :
https://youtu.be/GM8L324MuHc?list=PLqkckaeDLF4IDdKltyBwx8jLaz5nwDPQU


On Fri, May 3, 2019, 11:03 AM lampahome <pahome.chen at mirlab.org> wrote:

> I see some algo can cluster incrementally if dataset is too huge ex:
> minibatchkmeans and Birch.
>
> But is there any way to evaluate incrementally?
>
> I found silhouette-coefficient and Calinski-Harabaz index because I don't
> know the ground truth labels.
> But they can't evaluate incrementally.
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20190503/172eaa2b/attachment.html>


More information about the scikit-learn mailing list