I know I know, that's pretty outrageous to even suggest, but please bear with me, I am stumped as you may be: 2-D data file here: http://dl.dropbox.com/u/139035/data.npy Then: In [3]: data.mean() Out[3]: 3067.0243839999998 In [4]: data.max() Out[4]: 3052.4343 In [5]: data.shape Out[5]: (1000, 1000) In [6]: data.min() Out[6]: 3040.498 In [7]: data.dtype Out[7]: dtype('float32') A mean value calculated per loop over the data gives me 3045.747251076416 I first thought I still misunderstand how data.mean() works, per axis and so on, but did the same with a flattenend version with the same results. Am I really soo tired that I can't see what I am doing wrong here? For completion, the data was read by a osgeo.gdal dataset method called ReadAsArray() My numpy.__version__ gives me 1.6.1 and my whole setup is based on Enthought's EPD. Best regards, Michael
I know I know, that's pretty outrageous to even suggest, but please bear with me, I am stumped as you may be:
2-D data file here: http://dl.dropbox.com/u/139035/data.npy
Then: In [3]: data.mean() Out[3]: 3067.0243839999998
In [4]: data.max() Out[4]: 3052.4343
In [5]: data.shape Out[5]: (1000, 1000)
In [6]: data.min() Out[6]: 3040.498
In [7]: data.dtype Out[7]: dtype('float32')
A mean value calculated per loop over the data gives me 3045.747251076416 I first thought I still misunderstand how data.mean() works, per axis and so on, but did the same with a flattenend version with the same results.
Am I really soo tired that I can't see what I am doing wrong here? For completion, the data was read by a osgeo.gdal dataset method called ReadAsArray() My numpy.__version__ gives me 1.6.1 and my whole setup is based on Enthought's EPD.
Best regards, Michael
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion You have a million 32-bit floating point numbers that are in the
On 01/24/2012 12:33 PM, K.-Michael Aye wrote: thousands. Thus you are exceeding the 32-bitfloat precision and, if you can, you need to increase precision of the accumulator in np.mean() or change the input dtype:
a.mean(dtype=np.float32) # default and lacks precision 3067.0243839999998 a.mean(dtype=np.float64) 3045.747251076416 a.mean(dtype=np.float128) 3045.7472510764160156 b=a.astype(np.float128) b.mean() 3045.7472510764160156
Otherwise you are left to using some alternative approach to calculate the mean. Bruce
You have a million 32-bit floating point numbers that are in the thousands. Thus you are exceeding the 32-bitfloat precision and, if you can, you need to increase precision of the accumulator in np.mean() or change the input dtype:
a.mean(dtype=np.float32) # default and lacks precision 3067.0243839999998 a.mean(dtype=np.float64) 3045.747251076416 a.mean(dtype=np.float128) 3045.7472510764160156 b=a.astype(np.float128) b.mean() 3045.7472510764160156
Otherwise you are left to using some alternative approach to calculate the mean.
Bruce
Interesting -- I knew that float64 accumulators were used with integer arrays, and I had just assumed that 64-bit or higher accumulators would be used with floating-point arrays too, instead of the array's dtype. This is actually quite a bit of a gotcha for floating-point imaging-type tasks -- good to know! Zach
I have confirmed this on a 64-bit linux machine running python 2.7.2 with the development version of numpy. It seems to be related to using float32 instead of float64. If the array is first converted to a 64-bit float (via astype), mean gives an answer that agrees with your looped-calculation value: 3045.7472500000002. With the original 32-bit array, averaging successively on one axis and then on the other gives answers that agree with the 64-bit float answer to the second decimal place. In [125]: d = np.load('data.npy') In [126]: d.mean() Out[126]: 3067.0243839999998 In [127]: d64 = d.astype('float64') In [128]: d64.mean() Out[128]: 3045.747251076416 In [129]: d.mean(axis=0).mean() Out[129]: 3045.7487500000002 In [130]: d.mean(axis=1).mean() Out[130]: 3045.7444999999998 In [131]: np.version.full_version Out[131]: '2.0.0.dev-55472ca' -- On Tue, 2012-01-24 at 12:33 -0600, K.-MichaelA wrote:
I know I know, that's pretty outrageous to even suggest, but please bear with me, I am stumped as you may be:
2-D data file here: http://dl.dropbox.com/u/139035/data.npy
Then: In [3]: data.mean() Out[3]: 3067.0243839999998
In [4]: data.max() Out[4]: 3052.4343
In [5]: data.shape Out[5]: (1000, 1000)
In [6]: data.min() Out[6]: 3040.498
In [7]: data.dtype Out[7]: dtype('float32')
A mean value calculated per loop over the data gives me 3045.747251076416 I first thought I still misunderstand how data.mean() works, per axis and so on, but did the same with a flattenend version with the same results.
Am I really soo tired that I can't see what I am doing wrong here? For completion, the data was read by a osgeo.gdal dataset method called ReadAsArray() My numpy.__version__ gives me 1.6.1 and my whole setup is based on Enthought's EPD.
Best regards, Michael
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
-- -------------------------------------------------- Kathleen M. Tacina NASA Glenn Research Center MS 5-10 21000 Brookpark Road Cleveland, OH 44135 Telephone: (216) 433-6660 Fax: (216) 433-5802 --------------------------------------------------
On Jan 24, 2012, at 1:33 PM, K.-Michael Aye wrote:
I know I know, that's pretty outrageous to even suggest, but please bear with me, I am stumped as you may be:
2-D data file here: http://dl.dropbox.com/u/139035/data.npy
Then: In [3]: data.mean() Out[3]: 3067.0243839999998
In [4]: data.max() Out[4]: 3052.4343
In [5]: data.shape Out[5]: (1000, 1000)
In [6]: data.min() Out[6]: 3040.498
In [7]: data.dtype Out[7]: dtype('float32')
A mean value calculated per loop over the data gives me 3045.747251076416 I first thought I still misunderstand how data.mean() works, per axis and so on, but did the same with a flattenend version with the same results.
Am I really soo tired that I can't see what I am doing wrong here? For completion, the data was read by a osgeo.gdal dataset method called ReadAsArray() My numpy.__version__ gives me 1.6.1 and my whole setup is based on Enthought's EPD.
I get the same result: In [1]: import numpy In [2]: data = numpy.load('data.npy') In [3]: data.mean() Out[3]: 3067.0243839999998 In [4]: data.max() Out[4]: 3052.4343 In [5]: data.min() Out[5]: 3040.498 In [6]: numpy.version.version Out[6]: '2.0.0.dev-433b02a' This on OS X 10.7.2 with Python 2.7.1, on an intel Core i7. Running python as a 32 vs. 64-bit process doesn't make a difference. The data matrix doesn't look too strange when I view it as an image -- all pretty smooth variation around the (min, max) range. But maybe it's still somehow floating-point pathological? This is fun too: In [12]: data.mean() Out[12]: 3067.0243839999998 In [13]: (data/3000).mean()*3000 Out[13]: 3020.8074375000001 In [15]: (data/2).mean()*2 Out[15]: 3067.0243839999998 In [16]: (data/200).mean()*200 Out[16]: 3013.6754000000001 Zach
Just what Bruce said. You can run the following to confirm: np.mean(data - data.mean()) If for some reason you do not want to convert to float64 you can add the result of the previous line to the "bad" mean: bad_mean = data.mean() good_mean = bad_mean + np.mean(data - bad_mean) Val On Tue, Jan 24, 2012 at 12:33 PM, K.-Michael Aye <kmichael.aye@gmail.com>wrote:
I know I know, that's pretty outrageous to even suggest, but please bear with me, I am stumped as you may be:
2-D data file here: http://dl.dropbox.com/u/139035/data.npy
Then: In [3]: data.mean() Out[3]: 3067.0243839999998
In [4]: data.max() Out[4]: 3052.4343
In [5]: data.shape Out[5]: (1000, 1000)
In [6]: data.min() Out[6]: 3040.498
In [7]: data.dtype Out[7]: dtype('float32')
A mean value calculated per loop over the data gives me 3045.747251076416 I first thought I still misunderstand how data.mean() works, per axis and so on, but did the same with a flattenend version with the same results.
Am I really soo tired that I can't see what I am doing wrong here? For completion, the data was read by a osgeo.gdal dataset method called ReadAsArray() My numpy.__version__ gives me 1.6.1 and my whole setup is based on Enthought's EPD.
Best regards, Michael
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Thank you Bruce and all, I knew I was doing something wrong (should have read the mean method doc more closely). Am of course glad that's so easy understandable. But: If the error can get so big, wouldn't it be a better idea for the accumulator to always be of type 'float64' and then convert later to the type of the original array? As one can see in this case, the result would be much closer to the true value. Michael On 2012-01-24 19:01:40 +0000, Val Kalatsky said:
Just what Bruce said.
You can run the following to confirm: np.mean(data - data.mean())
If for some reason you do not want to convert to float64 you can add the result of the previous line to the "bad" mean: bad_mean = data.mean() good_mean = bad_mean + np.mean(data - bad_mean)
Val
On Tue, Jan 24, 2012 at 12:33 PM, K.-Michael Aye <kmichael.aye@gmail.com> wrote: I know I know, that's pretty outrageous to even suggest, but please bear with me, I am stumped as you may be:
2-D data file here: http://dl.dropbox.com/u/139035/data.npy
Then: In [3]: data.mean() Out[3]: 3067.0243839999998
In [4]: data.max() Out[4]: 3052.4343
In [5]: data.shape Out[5]: (1000, 1000)
In [6]: data.min() Out[6]: 3040.498
In [7]: data.dtype Out[7]: dtype('float32')
A mean value calculated per loop over the data gives me 3045.747251076416 I first thought I still misunderstand how data.mean() works, per axis and so on, but did the same with a flattenend version with the same results.
Am I really soo tired that I can't see what I am doing wrong here? For completion, the data was read by a osgeo.gdal dataset method called ReadAsArray() My numpy.__version__ gives me 1.6.1 and my whole setup is based on Enthought's EPD.
Best regards, Michael
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Hi, Oddly, but numpy 1.6 seems to behave more consistent manner: In []: sys.version Out[]: '2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit (Intel)]' In []: np.version.version Out[]: '1.6.0' In []: d= np.load('data.npy') In []: d.dtype Out[]: dtype('float32') In []: d.mean() Out[]: 3045.7471999999998 In []: d.mean(dtype= np.float32) Out[]: 3045.7471999999998 In []: d.mean(dtype= np.float64) Out[]: 3045.747251076416 In []: (d- d.min()).mean()+ d.min() Out[]: 3045.7472508750002 In []: d.mean(axis= 0).mean() Out[]: 3045.7472499999999 In []: d.mean(axis= 1).mean() Out[]: 3045.7472499999999 Or does the results of calculations depend more on the platform? My 2 cents, eat
On Wed, Jan 25, 2012 at 01:12:06AM +0200, eat wrote:
Or does the results of calculations depend more on the platform?
Floating point operations often do, sadly (not saying that this is the case here, but you'd need to try both versions on the same machine [or at least architecture/bit-width]/same platform to be certain). David
participants (7)
-
Bruce Southey
-
David Warde-Farley
-
eat
-
K.-Michael Aye
-
Kathleen M Tacina
-
Val Kalatsky
-
Zachary Pincus