[Numpy-discussion] Silent overflow of Int32 array

Sun Apr 10 07:25:08 EDT 2005

On Sun, 2005-04-10 at 10:23 +1000, Tim Churches wrote:
> I just got caught by code equivalent to this (with NumPy 23.8 on 32 bit 
> Linux):
> 
>  >>> import Numeric as N
>  >>> a = N.array((2000000000,1000000000),typecode=N.Int32)
>  >>> N.add.reduce(a)
> -1294967296
> 
> OK, it is an elementary mistake, but the silent overflow caught me 
> unawares. casting the array to Float64 before summing it avoids the  
> error, but in my instance the actual data is a rank-1 array of 21 
> million integers with a mean value of about 140 (which adds up more than 
> sys.maxint), and casting to Float64 will use quite a lot of memory (as 
> well as taking some time).
> 
> Any advice for catching or avoiding such overflow without always 
> incurring a performance and memory hit by always casting to Float64? 

Here's what numarray does:

>>> import numarray as N
>>> a = N.array((2000000000,1000000000),typecode=N.Int32)
>>> N.add.reduce(a)
-1294967296

So basic reductions in numarray have the same "careful while you're
shaving" behavior as Numeric;  it's fast but easy to screw up.

But:

>>> a.sum()
3000000000L
>>> a.sum(type='d')
3000000000.0

a.sum() blockwise upcasts to the largest type of kind on the fly, in
this case, Int64.   This avoids the storage overhead of typecasting the
entire array. 

A better name for the method would have been sumall() since it sums all
elements of a multi-dimensional array.  The flattening process reduces
on one dimension before flattening preventing a full copy of a
discontiguous array.  It could be smarter about choosing the dimension
of the initial reduction.

Regards,
Todd