Mailman 3 Re: [Numpy-discussion] sum and mean methods behaviour - NumPy-Discussion

3 Sep 2003

      I also believe that the current behavior for numarray/Numeric reduce method 
(not to cast) is the right one. It is fine to leave the user with the 
responsibility to be careful in the case of the reduce operation.

But to correctly calculate a mean or a sum by the array methods that are 
provided you have to convert the array first to a more precise type, and then 
do the calculation. That wastes space and is slow, and seems not very elegant 
considering that these are very common statistical operations.

A separate implementation for the mean() and sum() methods that uses double 
precision in the calculation without first converting the array would be 
straightforward. Since calculating a mean or a sum of a complete array is 
such a common case I think this would be useful.

That leaves the same problem for the reduce method which in some cases would 
require first a conversion, but this is much less of a problem (at least for 
me). Having to convert before the operation can be wasteful though.

I do like the idea that was also proposed on the list to supply an optional 
argument to specify the output type. Then the user has full control of the 
output type (nice if you want high precision in the result without converting 
the input), and the code can easily be used to implement the mean() and sum() 
methods. The default behavior of the reduce method can then remain unchanged, 
so this would not be an obtrusive change. But, I imagine that this may 
complicate the implementation.

Cheers, Peter

On Wednesday 03 September 2003 17:13, Paul Dubois wrote:
...
So after you get the result in a higher precision, then what?
a. Cast it down blindly?
b. Test every element and throw an exception if casting would lose
precision?
c. Test every element and return the smallest kind that "holds" the answer?
d. Always return the highest precision?
a. is close to equivalent to the present behavior
b. and c. are expensive.
c. makes the type of the result unpredictable, which has its own problems.
d. uses space
It was the originally design of Numeric to be fast rather than careful,
user beware. There is now a another considerable portion of the
community that is for very careful, and another that is for keeping it
small. You can't satisfy all those goals at once.
If you make it slow or big in order to be careful, it will always be
slow or big, while the opposite is not true. If you make it fast, the
user can be careful.
Todd Miller wrote:
...
On Mon, 2003-09-01 at 05:34, Peter Verveer wrote:
...
Hi All,
I noticed that the sum() and mean() methods of numarrays use the
precision of
the given array in their calculations. That leads to resuls like this:
...
...
...
array([255, 255], Int8).sum()
-2
...
...
...
array([255, 255], Int8).mean()
-1.0
Would it not be better to use double precision internally and return the
correct result?
Cheers, Peter
Hi Peter,
I thought about this a lot yesterday and today talked it over with
Perry.   There are several ways to fix the problem with mean() and
sum(), and I'm hoping that you and the rest of the community will help
sort them out.
(1) The first "solution" is to require users to do their own up-casting
prior to calling mean() or sum().  This gives the end user fine control
over storage cost but leaves the C-like pitfall/bug you discovered.   I
mention this because this is how the numarray/Numeric reductions are
designed.  Is there a reason why the numarray/Numeric reductions don't
implicitly up-cast?
(2) The second way is what you proposed:  use double precision within
mean and sum.  This has great simplicity but gives no control over
storage usage, and as implemented, the storage would be much higher than
one might think, potentially 8x.
(3) Lastly, Perry suggested a more radical approach:  rather than
changing the mean and sum methods themselves,  we could alter the
universal function accumulate and reduce methods to implicitly use
additional precision.  Perry's idea was to make all accumulations and
reductions up-cast their results to the largest type of the current
family, either Bool, Int64, Float64, or Complex64.   By doing this, we
can improve the utility of the reductions and accumulations as well as
fixing the problem with sum and mean.

Re: [Numpy-discussion] sum and mean methods behaviour

Peter Verveer

tags

participants (1)