[SciPy-User] kmeans
Benjamin Root
ben.root at ou.edu
Sun Jul 25 15:48:52 EDT 2010
On Sun, Jul 25, 2010 at 2:41 PM, David Cournapeau <cournape at gmail.com>wrote:
> On Sun, Jul 25, 2010 at 2:36 AM, Keith Goodman <kwgoodman at gmail.com>
> wrote:
> > _kmeans chokes on large thresholds:
> >
> >>> from scipy import cluster
> >>> v = np.array([1,2,3,4,10], dtype=float)
> >>> cluster.vq.kmeans(v, 1, thresh=1e15)
> > (array([ 4.]), 2.3999999999999999)
> >>> cluster.vq.kmeans(v, 1, thresh=1e16)
> > <snip>
> > IndexError: list index out of range
> >
> > The problem is in these lines:
> >
> > diff = thresh+1.
> > while diff > thresh:
> > <snip>
> > if(diff > thresh):
> >
> > If thresh is large then (thresh + 1) > thresh is False:
> >
> >>> thresh = 1e16
> >>> diff = thresh + 1.0
> >>> diff > thresh
> > False
> >
> > What's a use case for a large threshold? You might want to study the
> > algorithm by seeing the result after one iteration (not to be confused
> > with the iter input which is something else).
> >
> > One fix is to use 2*thresh instead for thresh + 1. But that just
> > pushes the problem out to higher thresholds
>
> Or just use the spacing function, which by definition returns the
> smallest number M such as thresh + M > thresh (except for nan/inf)
>
>
Or, one could just go with a "prime the loop" approach and perform the
operation once before the loop begins. Admittedly, this does seem rather
un-pythonic unless python has a do...while idiom that I am unaware of.
Ben Root
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20100725/22a76cb6/attachment.html>
More information about the SciPy-User
mailing list