[Numpy-discussion] bug in numpy.mean() ?

Tue Jan 24 13:58:57 EST 2012

On Jan 24, 2012, at 1:33 PM, K.-Michael Aye wrote:

> I know I know, that's pretty outrageous to even suggest, but please 
> bear with me, I am stumped as you may be:
> 
> 2-D data file here:
> http://dl.dropbox.com/u/139035/data.npy
> 
> Then:
> In [3]: data.mean()
> Out[3]: 3067.0243839999998
> 
> In [4]: data.max()
> Out[4]: 3052.4343
> 
> In [5]: data.shape
> Out[5]: (1000, 1000)
> 
> In [6]: data.min()
> Out[6]: 3040.498
> 
> In [7]: data.dtype
> Out[7]: dtype('float32')
> 
> 
> A mean value calculated per loop over the data gives me 3045.747251076416
> I first thought I still misunderstand how data.mean() works, per axis 
> and so on, but did the same with a flattenend version with the same 
> results.
> 
> Am I really soo tired that I can't see what I am doing wrong here?
> For completion, the data was read by a osgeo.gdal dataset method called 
> ReadAsArray()
> My numpy.__version__ gives me 1.6.1 and my whole setup is based on 
> Enthought's EPD.

I get the same result:

In [1]: import numpy

In [2]: data = numpy.load('data.npy')

In [3]: data.mean()
Out[3]: 3067.0243839999998

In [4]: data.max()
Out[4]: 3052.4343

In [5]: data.min()
Out[5]: 3040.498

In [6]: numpy.version.version
Out[6]: '2.0.0.dev-433b02a'

This on OS X 10.7.2 with Python 2.7.1, on an intel Core i7. Running python as a 32 vs. 64-bit process doesn't make a difference.

The data matrix doesn't look too strange when I view it as an image -- all pretty smooth variation around the (min, max) range. But maybe it's still somehow floating-point pathological?

This is fun too:
In [12]: data.mean()
Out[12]: 3067.0243839999998

In [13]: (data/3000).mean()*3000
Out[13]: 3020.8074375000001

In [15]: (data/2).mean()*2
Out[15]: 3067.0243839999998

In [16]: (data/200).mean()*200
Out[16]: 3013.6754000000001

Zach