[SciPy-User] kmeans
David Cournapeau
cournape at gmail.com
Sun Jul 25 17:59:53 EDT 2010
On Mon, Jul 26, 2010 at 6:53 AM, Keith Goodman <kwgoodman at gmail.com> wrote:
> On Sun, Jul 25, 2010 at 12:41 PM, David Cournapeau <cournape at gmail.com> wrote:
>> On Sun, Jul 25, 2010 at 2:36 AM, Keith Goodman <kwgoodman at gmail.com> wrote:
>>> _kmeans chokes on large thresholds:
>>>
>>>>> from scipy import cluster
>>>>> v = np.array([1,2,3,4,10], dtype=float)
>>>>> cluster.vq.kmeans(v, 1, thresh=1e15)
>>> (array([ 4.]), 2.3999999999999999)
>>>>> cluster.vq.kmeans(v, 1, thresh=1e16)
>>> <snip>
>>> IndexError: list index out of range
>>>
>>> The problem is in these lines:
>>>
>>> diff = thresh+1.
>>> while diff > thresh:
>>> <snip>
>>> if(diff > thresh):
>>>
>>> If thresh is large then (thresh + 1) > thresh is False:
>>>
>>>>> thresh = 1e16
>>>>> diff = thresh + 1.0
>>>>> diff > thresh
>>> False
>>>
>>> What's a use case for a large threshold? You might want to study the
>>> algorithm by seeing the result after one iteration (not to be confused
>>> with the iter input which is something else).
>>>
>>> One fix is to use 2*thresh instead for thresh + 1. But that just
>>> pushes the problem out to higher thresholds
>>
>> Or just use the spacing function, which by definition returns the
>> smallest number M such as thresh + M > thresh (except for nan/inf)
>
> Neat, I've never heard of np.spacing. But it suffers the same fate:
>
> Works:
>
>>> thresh = 1e16
>>> diff = thresh + np.spacing(thresh)
>>> diff > thresh
> True
>
> Doesn't work:
>
>>> thresh = 1e400
>>> diff = thresh + np.spacing(thresh)
>>> diff > thresh
> False
That's because 1e400 is inf for double precision numbers, and inf + N
> inf is never true :)
David
More information about the SciPy-User
mailing list