[Numpy-discussion] bug in numpy.mean() ?
Zachary Pincus
zachary.pincus at yale.edu
Tue Jan 24 13:58:57 EST 2012
On Jan 24, 2012, at 1:33 PM, K.-Michael Aye wrote:
> I know I know, that's pretty outrageous to even suggest, but please
> bear with me, I am stumped as you may be:
>
> 2-D data file here:
> http://dl.dropbox.com/u/139035/data.npy
>
> Then:
> In [3]: data.mean()
> Out[3]: 3067.0243839999998
>
> In [4]: data.max()
> Out[4]: 3052.4343
>
> In [5]: data.shape
> Out[5]: (1000, 1000)
>
> In [6]: data.min()
> Out[6]: 3040.498
>
> In [7]: data.dtype
> Out[7]: dtype('float32')
>
>
> A mean value calculated per loop over the data gives me 3045.747251076416
> I first thought I still misunderstand how data.mean() works, per axis
> and so on, but did the same with a flattenend version with the same
> results.
>
> Am I really soo tired that I can't see what I am doing wrong here?
> For completion, the data was read by a osgeo.gdal dataset method called
> ReadAsArray()
> My numpy.__version__ gives me 1.6.1 and my whole setup is based on
> Enthought's EPD.
I get the same result:
In [1]: import numpy
In [2]: data = numpy.load('data.npy')
In [3]: data.mean()
Out[3]: 3067.0243839999998
In [4]: data.max()
Out[4]: 3052.4343
In [5]: data.min()
Out[5]: 3040.498
In [6]: numpy.version.version
Out[6]: '2.0.0.dev-433b02a'
This on OS X 10.7.2 with Python 2.7.1, on an intel Core i7. Running python as a 32 vs. 64-bit process doesn't make a difference.
The data matrix doesn't look too strange when I view it as an image -- all pretty smooth variation around the (min, max) range. But maybe it's still somehow floating-point pathological?
This is fun too:
In [12]: data.mean()
Out[12]: 3067.0243839999998
In [13]: (data/3000).mean()*3000
Out[13]: 3020.8074375000001
In [15]: (data/2).mean()*2
Out[15]: 3067.0243839999998
In [16]: (data/200).mean()*200
Out[16]: 3013.6754000000001
Zach
More information about the NumPy-Discussion
mailing list