[scikit-learn] A basic question about kmeans algorithms elkan and llyod

Fri Mar 27 12:36:52 EDT 2020

There's an interesting analysis in this paper:
Fast K-Means with Accurate Bounds

http://proceedings.mlr.press/v48/newling16.pdf


On 3/26/20 3:40 AM, Alexandre Gramfort wrote:
> hi,
>
> I suspect Elkan is really winning when you have many centroids
> so the conclusion is not systematic
>
> my 2c
> Alex
>
>
> On Thu, Mar 26, 2020 at 3:18 AM MC_George123 at hotmail.com 
> <mailto:MC_George123 at hotmail.com> <MC_George123 at hotmail.com 
> <mailto:MC_George123 at hotmail.com>> wrote:
>
>     Hi admins,
>
>     My team is working on optimization on scikit-learn staff now. When
>     it comes to kmeans, I find there are two algorithms, one of which
>     is lloyd and the other is elkan, which is the optimized one for
>     lloyd using triangle inequality.  In the older version of
>     scikit-learn, elkan only supports dense dataset instead of sparse
>     one. And in the latest version, elkan supports both type of
>     datasets. So there is a question why both two algorithms are kept
>     in kmeans since they do the almost same thing and elkan is a
>     optimized one for lloyd. Are there any precision difference
>     between two algorithms and how can I decide what algorithm to use?
>
>     Best regards,
>
>     George Fan
>
>     _______________________________________________
>     scikit-learn mailing list
>     scikit-learn at python.org <mailto:scikit-learn at python.org>
>     https://mail.python.org/mailman/listinfo/scikit-learn
>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20200327/fc952465/attachment.html>