On Mon, 2003-09-01 at 05:34, Peter Verveer wrote:
Hi All,
I noticed that the sum() and mean() methods of numarrays use the precision of the given array in their calculations. That leads to resuls like this:
array([255, 255], Int8).sum() -2 array([255, 255], Int8).mean() -1.0
Would it not be better to use double precision internally and return the correct result?
Cheers, Peter
Hi Peter, I thought about this a lot yesterday and today talked it over with Perry. There are several ways to fix the problem with mean() and sum(), and I'm hoping that you and the rest of the community will help sort them out. (1) The first "solution" is to require users to do their own up-casting prior to calling mean() or sum(). This gives the end user fine control over storage cost but leaves the C-like pitfall/bug you discovered. I mention this because this is how the numarray/Numeric reductions are designed. Is there a reason why the numarray/Numeric reductions don't implicitly up-cast? (2) The second way is what you proposed: use double precision within mean and sum. This has great simplicity but gives no control over storage usage, and as implemented, the storage would be much higher than one might think, potentially 8x. (3) Lastly, Perry suggested a more radical approach: rather than changing the mean and sum methods themselves, we could alter the universal function accumulate and reduce methods to implicitly use additional precision. Perry's idea was to make all accumulations and reductions up-cast their results to the largest type of the current family, either Bool, Int64, Float64, or Complex64. By doing this, we can improve the utility of the reductions and accumulations as well as fixing the problem with sum and mean. -- Todd Miller jmiller@stsci.edu STSCI / ESS / SSB