[scikit-learn] A basic question about kmeans algorithms elkan and llyod

Mon Mar 30 15:03:58 EDT 2020

sorry I thought it also did experiements on what they call "sta" but I 
guess they are not included.
The conclusion is the same, though. Different algorithms show different 
performance on different datasets.

The Yingyang k-means has some elkan vs lloyd figures:
http://proceedings.mlr.press/v37/ding15.pdf

In table 2, the Elkan row, in cases the speedup is <1, it means elkans 
is slower than lloyd.
Elkans is also more memory intensive, so you can see some missing values 
in that where the computation couldn't be performed, but lloyd could.

On 3/30/20 3:33 AM, 樊 书华 wrote:
>
> Hi,
>
> Thanks for your suggestion of the paper. However, the paper shows many 
> more algorithms and finds out different algorithms show different 
> performance on dataset with various dimensions, Lloyd algorithm not 
> included. What I want to know is that can we remove the Lloyd 
> algorithm in kmeans of scikit-learn since elkan is an optimized on 
> with better performance.
>
> Best regards,
>
> George
>
> *From:* scikit-learn 
> <scikit-learn-bounces+mc_george123=hotmail.com at python.org> *On Behalf 
> Of *Andreas Mueller
> *Sent:* Saturday, March 28, 2020 12:37 AM
> *To:* scikit-learn at python.org
> *Subject:* Re: [scikit-learn] A basic question about kmeans algorithms 
> elkan and llyod
>
> There's an interesting analysis in this paper:
> Fast K-Means with Accurate Bounds
>
> http://proceedings.mlr.press/v48/newling16.pdf
>
> On 3/26/20 3:40 AM, Alexandre Gramfort wrote:
>
>     hi,
>
>     I suspect Elkan is really winning when you have many centroids
>
>     so the conclusion is not systematic
>
>     my 2c
>
>     Alex
>
>     On Thu, Mar 26, 2020 at 3:18 AM MC_George123 at hotmail.com
>     <mailto:MC_George123 at hotmail.com> <MC_George123 at hotmail.com
>     <mailto:MC_George123 at hotmail.com>> wrote:
>
>         Hi admins,
>
>         My team is working on optimization on scikit-learn staff now.
>         When it comes to kmeans, I find there are two algorithms, one
>         of which is lloyd and the other is elkan, which is the
>         optimized one for lloyd using triangle inequality.  In the
>         older version of scikit-learn, elkan only supports dense
>         dataset instead of sparse one. And in the latest version,
>         elkan supports both type of datasets. So there is a question
>         why both two algorithms are kept in kmeans since they do the
>         almost same thing and elkan is a optimized one for lloyd. Are
>         there any precision difference between two algorithms and how
>         can I decide what algorithm to use?
>
>         Best regards,
>
>         George Fan
>
>         _______________________________________________
>         scikit-learn mailing list
>         scikit-learn at python.org <mailto:scikit-learn at python.org>
>         https://mail.python.org/mailman/listinfo/scikit-learn
>
>
>
>     _______________________________________________
>
>     scikit-learn mailing list
>
>     scikit-learn at python.org  <mailto:scikit-learn at python.org>
>
>     https://mail.python.org/mailman/listinfo/scikit-learn
>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20200330/7d00c2b9/attachment.html>