<div dir="ltr"><div dir="ltr"></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">Uri Goren <<a href="mailto:ugoren@gmail.com">ugoren@gmail.com</a>> 於 2019年5月3日 週五 下午7:29寫道：<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="auto"><div dir="auto" style="font-family:sans-serif;font-size:12.8px">I usually use clustering to save costs on labelling.<div dir="auto">I like to apply hierarchical clustering, and then label a small sample and fine-tune the clustering algorithm.</div><div dir="auto"><br></div><div dir="auto">That way, you can evaluate the effectiveness in terms of cluster purity (how many clusters contain mixed labels) </div><div dir="auto"><br></div><div dir="auto">See example with sklearn here :</div><div dir="auto"><a href="https://youtu.be/GM8L324MuHc?list=PLqkckaeDLF4IDdKltyBwx8jLaz5nwDPQU" style="text-decoration-line:none;color:rgb(66,133,244)" target="_blank">https://youtu.be/GM8L324MuHc?list=PLqkckaeDLF4IDdKltyBwx8jLaz5nwDPQU</a></div></div><br></div><br></blockquote><div>But if my dataset is too large to load into memory, will it work?</div><div> </div></div></div>