[scikit-learn] Can I evaluate clustering efficiency incrementally?
tom.augspurger88 at gmail.com
Tue May 14 09:18:24 EDT 2019
If anyone is interested in implementing these, dask-ml would welcome
metrics that work well with Dask arrays:
On Tue, May 14, 2019 at 2:09 AM Uri Goren <ugoren at gmail.com> wrote:
> Sounds like you need to use spark,
> this project looks promising:
> On Tue, May 14, 2019 at 5:12 AM lampahome <pahome.chen at mirlab.org> wrote:
>> Uri Goren <ugoren at gmail.com> 於 2019年5月3日 週五 下午7:29寫道：
>>> I usually use clustering to save costs on labelling.
>>> I like to apply hierarchical clustering, and then label a small sample
>>> and fine-tune the clustering algorithm.
>>> That way, you can evaluate the effectiveness in terms of cluster purity
>>> (how many clusters contain mixed labels)
>>> See example with sklearn here :
>>> But if my dataset is too large to load into memory, will it work?
>> scikit-learn mailing list
>> scikit-learn at python.org
> scikit-learn mailing list
> scikit-learn at python.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the scikit-learn