[Numpy-discussion] large float32 array issue

Wed Nov 3 07:39:08 EDT 2010

On 11/03/2010 12:31 PM, Warren Weckesser wrote:
>
>
> On Wed, Nov 3, 2010 at 5:59 AM, Warren Weckesser
> <warren.weckesser at enthought.com <mailto:warren.weckesser at enthought.com>>
> wrote:
>
>
>
>     On Wed, Nov 3, 2010 at 3:54 AM, Vincent Schut <schut at sarvision.nl
>     <mailto:schut at sarvision.nl>> wrote:
>
>         Hi, I'm running in this strange issue when using some pretty large
>         float32 arrays. In the following code I create a large array
>         filled with
>         ones, and calculate mean and sum, first with a float64 version, then
>         with a float32 version. Note the difference between the two. NB the
>         float64 version is obviously right :-)
>
>
>
>         In [2]: areaGrid = numpy.ones((11334, 16002))
>         In [3]: print(areaGrid.dtype)
>         float64
>         In [4]: print(areaGrid.shape, areaGrid.min(), areaGrid.max(),
>         areaGrid.mean(), areaGrid.sum())
>         ((11334, 16002), 1.0, 1.0, 1.0, 181366668.0)
>
>
>         In [5]: areaGrid = numpy.ones((11334, 16002), numpy.float32)
>         In [6]: print(areaGrid.dtype)
>         float32
>         In [7]: print(areaGrid.shape, areaGrid.min(), areaGrid.max(),
>         areaGrid.mean(), areaGrid.sum())
>         ((11334, 16002), 1.0, 1.0, 0.092504406598019437, 16777216.0)
>
>
>         Can anybody confirm this? And better: explain it? Am I running
>         into a
>         for me till now hidden ieee float 'feature'? Or is it a bug
>         somewhere?
>
>         Btw I'd like to use float32 arrays, as precision is not really
>         an issue
>         in this case, but memory usage is...
>
>
>         This is using python 2.7, numpy from git (yesterday's checkout),
>         on arch
>         linux 64bit.
>
>
>
>     The problem kicks in with an array of ones of size 2**24.  Note that
>     np.float32(2**24) + np.float32(1.0) equals np.float32(2**24):
>
>
>     In [41]: b = np.ones(2**24, np.float32)
>
>     In [42]: b.size, b.sum()
>     Out[42]: (16777216, 16777216.0)
>
>     In [43]: b = np.ones(2**24+1, np.float32)
>
>     In [44]: b.size, b.sum()
>     Out[44]: (16777217, 16777216.0)
>
>     In [45]: np.spacing(np.float32(2**24))
>     Out[45]: 2.0
>
>     In [46]: np.float32(2**24) + np.float32(1)
>     Out[46]: 16777216.0
>
>
>
>
> By the way, you can override the dtype of the accumulator of the mean()
> function:
>
> In [61]: a = np.ones((11334,16002),np.float32)
>
> In [62]: a.mean()  # Not correct
> Out[62]: 0.092504406598019437
>
> In [63]: a.mean(dtype=np.float64)
> Out[63]: 1.0

Thanks for this. That at least gives me a temporary solution (I actually 
need sum() instead of mean(), but the trick works for sum too).

Btw, should I file a bug on this?

Vincent.

>
>
> Warren
>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion