[SciPy-User] kmeans

Benjamin Root ben.root at ou.edu
Thu Jul 22 10:48:04 EDT 2010


On Wed, Jul 21, 2010 at 3:25 PM, alex <argriffi at ncsu.edu> wrote:

> Hi,
>
> I want to nitpick about the scipy kmeans clustering implementation.
> Throughout the documentation
> http://docs.scipy.org/doc/scipy/reference/cluster.vq.html and code, the
> 'distortion' of a clustering is defined as "the sum of the distances between
> each observation vector and its dominating centroid."  I think that the sum
> of squares of distances should be used instead of the sum of distances, and
> all of the miscellaneous kmeans descriptions I found with google would seem
> to support this.
>
> For example if one cluster contains the 1D points (1, 2, 3, 4, 10) and the
> old center is 3, then the centroid updating step will move the centroid to
> 4.  This step reduces the sum of squares of distances from 55 to 50, but it
> increases the distortion from 11 to 12.
>
> Alex
>

Every implementation of kmeans (except for SciPy's) that I have seen allowed
for the user to specify which distance measure they want to use.  There is
no right answer for a distance measure except for "it depends".  Maybe
SciPy's implementation should be updated to allow for user-specified
distance measures (e.g. - absolute, euclidian, city-block, etc.)?

Ben Root
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20100722/c8118550/attachment.html>


More information about the SciPy-User mailing list