<div dir="auto"><div dir="auto" style="font-family:sans-serif;font-size:12.8px">I usually use clustering to save costs on labelling.<div dir="auto">I like to apply hierarchical clustering, and then label a small sample and fine-tune the clustering algorithm.</div><div dir="auto"><br></div><div dir="auto">That way, you can evaluate the effectiveness in terms of cluster purity (how many clusters contain mixed labels) </div><div dir="auto"><br></div><div dir="auto">See example with sklearn here :</div><div dir="auto"><a href="https://youtu.be/GM8L324MuHc?list=PLqkckaeDLF4IDdKltyBwx8jLaz5nwDPQU" style="text-decoration-line:none;color:rgb(66,133,244)">https://youtu.be/GM8L324MuHc?list=PLqkckaeDLF4IDdKltyBwx8jLaz5nwDPQU</a></div></div><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, May 3, 2019, 11:03 AM lampahome <<a href="mailto:pahome.chen@mirlab.org">pahome.chen@mirlab.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">I see some algo can cluster incrementally if dataset is too huge ex: minibatchkmeans and Birch.<div><br></div><div>But is there any way to evaluate incrementally?</div><div><br></div><div>I found silhouette-coefficient and Calinski-Harabaz index because I don't know the ground truth labels.</div><div>But they can't evaluate incrementally.</div></div>

_______________________________________________<br>

scikit-learn mailing list<br>

<a href="mailto:scikit-learn@python.org" target="_blank" rel="noreferrer">scikit-learn@python.org</a><br>

<a href="https://mail.python.org/mailman/listinfo/scikit-learn" rel="noreferrer noreferrer" target="_blank">https://mail.python.org/mailman/listinfo/scikit-learn</a><br>

</blockquote></div>