[Numpy-discussion] bug in numpy.mean() ?

Tue Jan 24 13:50:31 EST 2012

On 01/24/2012 12:33 PM, K.-Michael Aye wrote:
> I know I know, that's pretty outrageous to even suggest, but please
> bear with me, I am stumped as you may be:
>
> 2-D data file here:
> http://dl.dropbox.com/u/139035/data.npy
>
> Then:
> In [3]: data.mean()
> Out[3]: 3067.0243839999998
>
> In [4]: data.max()
> Out[4]: 3052.4343
>
> In [5]: data.shape
> Out[5]: (1000, 1000)
>
> In [6]: data.min()
> Out[6]: 3040.498
>
> In [7]: data.dtype
> Out[7]: dtype('float32')
>
>
> A mean value calculated per loop over the data gives me 3045.747251076416
> I first thought I still misunderstand how data.mean() works, per axis
> and so on, but did the same with a flattenend version with the same
> results.
>
> Am I really soo tired that I can't see what I am doing wrong here?
> For completion, the data was read by a osgeo.gdal dataset method called
> ReadAsArray()
> My numpy.__version__ gives me 1.6.1 and my whole setup is based on
> Enthought's EPD.
>
> Best regards,
> Michael
>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
You have a million 32-bit floating point numbers that are in the 
thousands. Thus you are exceeding the 32-bitfloat precision and, if you 
can, you need to increase precision of the accumulator in np.mean() or 
change the input dtype:
 >>> a.mean(dtype=np.float32) # default and lacks precision
3067.0243839999998
 >>> a.mean(dtype=np.float64)
3045.747251076416
 >>> a.mean(dtype=np.float128)
3045.7472510764160156
 >>> b=a.astype(np.float128)
 >>> b.mean()
3045.7472510764160156

Otherwise you are left to using some alternative approach to calculate 
the mean.

Bruce