[Numpy-discussion] Counting array elements
Peter Verveer
verveer at embl-heidelberg.de
Tue Oct 26 11:20:02 EDT 2004
On Oct 26, 2004, at 6:19 PM, Chris Barker wrote:
> Peter Verveer wrote:
>> It seems to me that the behavior one would expect for a function like
>> that, would be to apply the operation to the whole array. Not along
>> an axis. What would you expect as a new user if you call a minimum()
>> function? A single value that is the minimum. So that is the logical
>> choice for the default behavior, I would think.
>
> nope. I'd expect it to be along an axis, by default the last one.
I still do not agree completely with that, I will elaborate more below,
because I also do not agree anymore with my own earlier writings :-).
But I see your point that this type of operation can be natural
depending on what you are doing. Sometimes a single value does make
sense, sometimes not, I think we can agree on that.
>> Yes, that would be the idea anyway. The question is what should be
>> the default behavior for this type of functions, something I think we
>> should not decide based on the current behavior of a single existing
>> function, but based on what makes the most sense. That is obviously
>> something that can be discussed...
>
> yup, but frankly, this isn't about just one function, it's really
> about all the reductions: min, max, sum, etc, etc.
Actually no. It seems that sum() is a special case, along with a few
others. Again: I elaborate on the general case below.
> I think the rule of thumb is not to break backward compatibility
> unless there is a compelling reason, and given that it's not clear
> what is most "natural" in this case, keeping the default the same
> makes the most sense.
I agree. In contrast what I have said before I think we should keep it
as it is, for compatibility.
Now to elaborate on the general problem, please correct me if I get
something wrong. I will use the minimum function as an example and come
back to sum() later.
If you look at a minimum operation then there are three different
things you might like to do:
1) An element by element minimum: minimum(a1, a2). This is the current
behaviour. Like all binary ufuncs of this type, it operates on pairs of
arrays. So by default it does not do reduction or calculate a single
minimum. For most ufuncs that is the natural behavior anyway.
2) A reduction: minimum.reduce(a1). The reduce method of ufuncs is
generally used for reductions. Having to use .reduce makes clear what
you are doing. Although a bit odd at first sight, I think it is a
clever way to overload ufuncs names with different functionality.
3) The minimum of the array: In numarray you do a1.min(). I think in
Numeric, you have to do something like minimum.reduce(a1.flat), correct
me if I am wrong. Not nice in both cases...
Note that calling a binary ufunc with a single argument will give an
error: minimum(a1) raises a TypeError. That seems to be a good
decision, because people seem to have different ideas of what should
happen: I would expect the minimum of the array, others expect a
reduction. Generally I guess it was a wise decision not to change the
meaning of a function depending on wether it has one or two arguments.
The sum() function is an alias to add.reduce. there are a few more of
these aliases (i.e. product). I would still say that it is a bit
unfortunate, since not everybody may immediately realize that these
functions are in fact reductions.
I wonder if one would not be better of without these functions at all,
after all you can access the functionality through .reduce(). If you
mind the extra typing, just define your own alias. Can't we shift them
into numarray.numeric? Just a thought...
In any case, clearly these functions need to stay around as they are
for compatibility reasons. It is far more productive to add the
functionality that a few people already proposed: allow reductions over
multiple axes. I would welcome that, I always found 1D reductions a bit
limited anyway. Obviously you can do sequential 1D reductions, but that
can be quite inefficient. As proposed, the axis argument would take
maybe a list of dimensions, and 'all' or None. I would like to propose
an additional possibility: like minimum.reduce(), we could have a
minimum.all() function that reduces over all dimensions (with a
potentially much more efficient implementation.) We don't need a
sum_all(a1) then, you would use add.all(a1). I guess this would be
easily prototyped using sequential reductions, one can worry about
efficiency later.
Sorry for the long story...
Cheers, Peter
More information about the NumPy-Discussion
mailing list