[Numpy-discussion] sum and mean methods behaviour
verveer at embl-heidelberg.de
Wed Sep 3 08:40:20 EDT 2003
I also believe that the current behavior for numarray/Numeric reduce method
(not to cast) is the right one. It is fine to leave the user with the
responsibility to be careful in the case of the reduce operation.
But to correctly calculate a mean or a sum by the array methods that are
provided you have to convert the array first to a more precise type, and then
do the calculation. That wastes space and is slow, and seems not very elegant
considering that these are very common statistical operations.
A separate implementation for the mean() and sum() methods that uses double
precision in the calculation without first converting the array would be
straightforward. Since calculating a mean or a sum of a complete array is
such a common case I think this would be useful.
That leaves the same problem for the reduce method which in some cases would
require first a conversion, but this is much less of a problem (at least for
me). Having to convert before the operation can be wasteful though.
I do like the idea that was also proposed on the list to supply an optional
argument to specify the output type. Then the user has full control of the
output type (nice if you want high precision in the result without converting
the input), and the code can easily be used to implement the mean() and sum()
methods. The default behavior of the reduce method can then remain unchanged,
so this would not be an obtrusive change. But, I imagine that this may
complicate the implementation.
On Wednesday 03 September 2003 17:13, Paul Dubois wrote:
> So after you get the result in a higher precision, then what?
> a. Cast it down blindly?
> b. Test every element and throw an exception if casting would lose
> c. Test every element and return the smallest kind that "holds" the answer?
> d. Always return the highest precision?
> a. is close to equivalent to the present behavior
> b. and c. are expensive.
> c. makes the type of the result unpredictable, which has its own problems.
> d. uses space
> It was the originally design of Numeric to be fast rather than careful,
> user beware. There is now a another considerable portion of the
> community that is for very careful, and another that is for keeping it
> small. You can't satisfy all those goals at once.
> If you make it slow or big in order to be careful, it will always be
> slow or big, while the opposite is not true. If you make it fast, the
> user can be careful.
> Todd Miller wrote:
> > On Mon, 2003-09-01 at 05:34, Peter Verveer wrote:
> >>Hi All,
> >>I noticed that the sum() and mean() methods of numarrays use the
> >> precision of
> >>the given array in their calculations. That leads to resuls like this:
> >>>>>array([255, 255], Int8).sum()
> >>>>>array([255, 255], Int8).mean()
> >>Would it not be better to use double precision internally and return the
> >>correct result?
> >>Cheers, Peter
> > Hi Peter,
> > I thought about this a lot yesterday and today talked it over with
> > Perry. There are several ways to fix the problem with mean() and
> > sum(), and I'm hoping that you and the rest of the community will help
> > sort them out.
> > (1) The first "solution" is to require users to do their own up-casting
> > prior to calling mean() or sum(). This gives the end user fine control
> > over storage cost but leaves the C-like pitfall/bug you discovered. I
> > mention this because this is how the numarray/Numeric reductions are
> > designed. Is there a reason why the numarray/Numeric reductions don't
> > implicitly up-cast?
> > (2) The second way is what you proposed: use double precision within
> > mean and sum. This has great simplicity but gives no control over
> > storage usage, and as implemented, the storage would be much higher than
> > one might think, potentially 8x.
> > (3) Lastly, Perry suggested a more radical approach: rather than
> > changing the mean and sum methods themselves, we could alter the
> > universal function accumulate and reduce methods to implicitly use
> > additional precision. Perry's idea was to make all accumulations and
> > reductions up-cast their results to the largest type of the current
> > family, either Bool, Int64, Float64, or Complex64. By doing this, we
> > can improve the utility of the reductions and accumulations as well as
> > fixing the problem with sum and mean.
More information about the NumPy-Discussion