New subject: numpy.mean still broken for largefloat32arrays

26 Jul 2014

      Ray: I'm not working with Hubble data, but yeah these are all issues I've run into with my terrabytes of microscopy data as well. Given that such raw data comes as uint16, its best to do your calculations as much as possible in good old ints. What you compute is what you get, no obscure shenanigans.

It just occurred to me that pairwise summation will lead to highly branchy code, and you can forget about any vector extensions. Tradeoffs indeed. Any such hierarchical summation is probably best done by aggregating naively summed blocks. 

-----Original Message-----
From: "RayS" 
Sent: ‎25-‎7-‎2014 23:26
To: "Discussion of Numerical Python" 
Subject: Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

At 11:29 AM 7/25/2014, you wrote:
...
On Fri, Jul 25, 2014 at 5:56 PM, RayS  wrote:
...
The important point was that it would be best if all of the 
methods affected
by summing 32 bit floats with 32 bit accumulators had the same Notes as
numpy.mean(). We went through a lot of code yesterday, assuming that any
numpy or Scipy.stats functions that use accumulators suffer the same issue,
whether noted or not, and found it true.
Do you have a list of the functions that are affected?
We only tested a few we used, but
scipy.stats.nanmean, or any .*mean()
numpy.sum, mean, average, std, var,...

via something like:

import numpy
import scipy.stats
print numpy.__version__
print scipy.__version__
onez = numpy.ones((2**25, 1), numpy.float32)
step = 2**10
func = scipy.stats.nanmean
for s in range(2**24-step, 2**25, step):
     if func(onez[:s+step])!=1.:
         print '\nbroke', s, func(onez[:s+step])
         break
     else:
         print '\r',s,
...
That said, it does seem that np.mean could be implemented better than
it is, even given float32's inherent limitations. If anyone wants to
implement better algorithms for computing the mean, variance, sums,
etc., then we would love to add them to numpy.
Others have pointed out the possible tradeoffs in summation algos, 
perhaps a method arg would be appropriate, "better" depending on your 
desire for speed vs. accuracy.

It just occurred to me that if the STSI folks (who count photons) 
took the mean() or other such func of an image array from Hubble 
sensors to find background value, they'd better always be using float64.

  - Ray

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

Eelco Hoogendoorn

Julian Taylor

tags

participants (2)