[Numpy-discussion] Re: scipy.stats.itemfreq: overflow with add.reduce
Tim Churches
tchur at optushome.com.au
Wed Dec 21 12:39:03 EST 2005
Hans Georg Krauthaeuser wrote:
> Hans Georg Krauthaeuser schrieb:
>
>>Hi All,
>>
>>I was playing with scipy.stats.itemfreq when I observed the following
>>overflow:
>>
>>In [119]:for i in [254,255,256,257,258]:
>> .....: l=[0]*i
>> .....: print i, stats.itemfreq(l), l.count(0)
>> .....:
>>254 [ [ 0 254]] 254
>>255 [ [ 0 255]] 255
>>256 [ [0 0]] 256
>>257 [ [0 1]] 257
>>258 [ [0 2]] 258
>>
>>itemfreq is pretty small (in stats.py):
>>
>>----------------------------------------------------------------------
>>def itemfreq(a):
>> """
>>Returns a 2D array of item frequencies. Column 1 contains item values,
>>column 2 contains their respective counts. Assumes a 1D array is passed.
>>
>>Returns: a 2D frequency table (col [0:n-1]=scores, col n=frequencies)
>>"""
>> scores = _support.unique(a)
>> scores = sort(scores)
>> freq = zeros(len(scores))
>> for i in range(len(scores)):
>> freq[i] = add.reduce(equal(a,scores[i]))
>> return array(_support.abut(scores, freq))
>>----------------------------------------------------------------------
>>
>>It seems that add.reduce is the source for the overflow:
>>
>>In [116]:from scipy import *
>>
>>In [117]:for i in [254,255,256,257,258]:
>> .....: l=[0]*i
>> .....: print i, add.reduce(equal(l,0))
>> .....:
>>254 254
>>255 255
>>256 0
>>257 1
>>258 2
>>
>>Is there any possibility to avoid the overflow?
Apropos the preceding, herewith a thread on the Numpy list from a more
than a few months ago. The take-home message is that for integer arrays,
add.reduce is very fast at producing results which fall into two
categories: a) correct or b) incorrect due to overflow. Unfortunately
there is no equally quick method of determining into which of these two
categories any specific result returned by add.reduce falls.
Tim C
More information about the NumPy-Discussion
mailing list