No subject

Tim Churches tchur at optushome.com.au
Thu Nov 16 16:50:41 EST 2006


Todd Miller wrote:
> On Sun, 2005-04-10 at 10:23 +1000, Tim Churches wrote:
>
>>I just got caught by code equivalent to this (with NumPy 23.8 on 32 bit
>>Linux):
>>
>> >>> import Numeric as N
>> >>> a = N.array((2000000000,1000000000),typecode=N.Int32)
>> >>> N.add.reduce(a)
>>-1294967296
>>
>>OK, it is an elementary mistake, but the silent overflow caught me
>>unawares. casting the array to Float64 before summing it avoids the
>>error, but in my instance the actual data is a rank-1 array of 21
>>million integers with a mean value of about 140 (which adds up more than
>>sys.maxint), and casting to Float64 will use quite a lot of memory (as
>>well as taking some time).
>>
>>Any advice for catching or avoiding such overflow without always
>>incurring a performance and memory hit by always casting to Float64?
>
>
> Here's what numarray does:
>
>
>>>>import numarray as N
>>>>a = N.array((2000000000,1000000000),typecode=N.Int32)
>>>>N.add.reduce(a)
>
> -1294967296
>
> So basic reductions in numarray have the same "careful while you're
> shaving" behavior as Numeric; it's fast but easy to screw up.

Sure, but how does one be careful? It seems that for any array of two
integers or more which could sum to more than sys.maxint or less than
-sys.maxint, add.reduce() in both NumPy and Numeric will give either a)
the correct answer or b) the incorrect answer, and short of adding up
the array using a safer but much slower method, there is no way of
determining if the answer provided (quickly) by add.reduce is right or
wrong? Which seems to make it fast but useless (for integer arrays, at
least? Is that an unfair summary? Can anyone point me towards a method
for using add.reduce() on small arrays of large integers with values in
the billions, or on large arrays of fairly small integer values, which
will not suddenly and without warning give the wrong answer?

>
> But:
>
>
>>>>a.sum()
>
> 3000000000L
>
>>>>a.sum(type='d')
>
> 3000000000.0
>
> a.sum() blockwise upcasts to the largest type of kind on the fly, in
> this case, Int64. This avoids the storage overhead of typecasting the
> entire array.

That's on a 64-bit platform, right? The same method could be used to
cast the accumulator to a Float64 on a 32-bit platform to avoid casting
the entire array?

> A better name for the method would have been sumall() since it sums all
> elements of a multi-dimensional array. The flattening process reduces
> on one dimension before flattening preventing a full copy of a
> discontiguous array. It could be smarter about choosing the dimension
> of the initial reduction.

OK, thanks. Unfortunately it is not possible for us to port our
application to numarray at the moment. But the insight is most helpful.

Tim C




More information about the NumPy-Discussion mailing list