[scikit-learn] Can I evaluate clustering efficiency incrementally?

Joel Nothman joel.nothman at gmail.com
Wed May 15 00:14:17 EDT 2019


Evaluating on large datasets is easy if the sufficient statistics are just
the contingency matrix.

On Tue., 14 May 2019, 11:19 pm Tom Augspurger, <tom.augspurger88 at gmail.com>
wrote:

> If anyone is interested in implementing these, dask-ml would welcome
> additional
> metrics that work well with Dask arrays:
> https://github.com/dask/dask-ml/issues/213.
>
> On Tue, May 14, 2019 at 2:09 AM Uri Goren <ugoren at gmail.com> wrote:
>
>> Sounds like you need to use spark,
>> this project looks promising:
>> https://github.com/xiaocai00/SparkPinkMST
>>
>> On Tue, May 14, 2019 at 5:12 AM lampahome <pahome.chen at mirlab.org> wrote:
>>
>>>
>>> Uri Goren <ugoren at gmail.com> 於 2019年5月3日 週五 下午7:29寫道:
>>>
>>>> I usually use clustering to save costs on labelling.
>>>> I like to apply hierarchical clustering, and then label a small sample
>>>> and fine-tune the clustering algorithm.
>>>>
>>>> That way, you can evaluate the effectiveness in terms of cluster purity
>>>> (how many clusters contain mixed labels)
>>>>
>>>> See example with sklearn here :
>>>> https://youtu.be/GM8L324MuHc?list=PLqkckaeDLF4IDdKltyBwx8jLaz5nwDPQU
>>>>
>>>>
>>>> But if my dataset is too large to load into memory, will it work?
>>>
>>> _______________________________________________
>>> scikit-learn mailing list
>>> scikit-learn at python.org
>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20190515/3d6a0948/attachment.html>


More information about the scikit-learn mailing list