[SciPy-User] kmeans

Benjamin Root ben.root at ou.edu
Fri Jul 23 20:46:46 EDT 2010


On Fri, Jul 23, 2010 at 6:48 PM, Keith Goodman <kwgoodman at gmail.com> wrote:

> On Fri, Jul 23, 2010 at 4:00 PM, Benjamin Root <ben.root at ou.edu> wrote:
>
> > The stopping condition uses the change in the distortion, not a
> non-squared
> > distance.  The distortion is already a sum of squares.  The only place
> that
> > a non-squared distance is used is in _py_vq_1d() which appears to be very
> > old code and it has a raise error at the very first statement.
>
> That's good news.
>
> Another place that a non-squared distance is used is the return value:
>
> >> import numpy as np
> >> from scipy import cluster
> >> v = np.array([1,2,3,4,10],dtype=float)
> >> cluster.vq.kmeans(v, 1)
>   (array([ 4.]), 2.3999999999999999)
>
> >> np.sqrt(np.dot(v-4, v-4) / 5.0)
>   3.1622776601683795  # Nope, not returned
> >> np.absolute(v - 4).mean()
>   2.3999999999999999 # Yep, this one is returned
>
> Is that a code bug or a doc bug?
>

Well, see, that's just the thing... the doc says that it returns the
distortion, which is what it does, but obviously, this distortion was a MAE
and not a RMSE.  The problem is that I have gone backwards and forwards over
the codes, including the Cython version, and I can't find anyplace where
this is happening.

Does anybody know of any good code tracing tools?  I used trace once, but it
wasn't very user-friendly...

Ben Root
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20100723/800edfde/attachment.html>


More information about the SciPy-User mailing list