[SciPy-User] kmeans

alex argriffi at ncsu.edu
Wed Jul 21 16:25:13 EDT 2010


Hi,

I want to nitpick about the scipy kmeans clustering implementation.
Throughout the documentation
http://docs.scipy.org/doc/scipy/reference/cluster.vq.html and code, the
'distortion' of a clustering is defined as "the sum of the distances between
each observation vector and its dominating centroid."  I think that the sum
of squares of distances should be used instead of the sum of distances, and
all of the miscellaneous kmeans descriptions I found with google would seem
to support this.

For example if one cluster contains the 1D points (1, 2, 3, 4, 10) and the
old center is 3, then the centroid updating step will move the centroid to
4.  This step reduces the sum of squares of distances from 55 to 50, but it
increases the distortion from 11 to 12.

Alex
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20100721/b0ff1bb0/attachment.html>


More information about the SciPy-User mailing list