[Numpy-discussion] Counting array elements

Tue Oct 26 11:20:02 EDT 2004

On Oct 26, 2004, at 6:19 PM, Chris Barker wrote:

> Peter Verveer wrote:
>> It seems to me that the behavior one would expect for a function like 
>> that, would be to apply the operation to the whole array. Not along 
>> an axis. What would you expect as a new user if you call a minimum() 
>> function?  A single value that is the minimum. So that is the logical 
>> choice for the default behavior, I would think.
>
> nope. I'd expect it to be along an axis, by default the last one.

I still do not agree completely with that, I will elaborate more below, 
because I also do not agree anymore with my own earlier writings :-).

But I see your point that this type of operation can be natural 
depending on what you are doing. Sometimes a single value does make 
sense, sometimes not, I think we can agree on that.

>> Yes, that would be the idea anyway. The question is what should be 
>> the default behavior for this type of functions, something I think we 
>> should not decide based on the current behavior of a single existing 
>> function, but based on what makes the most sense. That is obviously 
>> something that can be discussed...
>
> yup, but frankly, this isn't about just one function, it's really 
> about all the reductions: min, max, sum, etc, etc.

Actually no. It seems that sum() is a special case, along with a few 
others. Again: I elaborate on the general case below.

> I think the rule of thumb is not to break backward compatibility 
> unless there is a compelling reason, and given that it's not clear 
> what is most "natural" in this case, keeping the default the same 
> makes the most sense.

I agree. In contrast what I have said before I think we should keep it 
as it is, for compatibility.

Now to elaborate on the general problem, please correct me if I get 
something wrong. I will use the minimum function as an example and come 
back to sum() later.

If you look at a minimum operation then there are three different 
things you might like to do:

1) An element by element minimum: minimum(a1, a2). This is the current 
behaviour. Like all binary ufuncs of this type, it operates on pairs of 
arrays. So by default it does not do reduction or calculate a single 
minimum. For most ufuncs that is the natural behavior anyway.

2) A reduction: minimum.reduce(a1). The reduce method of ufuncs is 
generally used for reductions. Having to use .reduce makes clear what 
you are doing. Although a bit odd at first sight, I think it is a 
clever way to overload ufuncs names with different functionality.

3) The minimum of the array:  In numarray you do a1.min(). I think in 
Numeric, you have to do something like minimum.reduce(a1.flat), correct 
me if I am wrong. Not nice in both cases...

Note that calling a binary ufunc with a single argument will give an 
error: minimum(a1) raises a TypeError. That seems to be a good 
decision, because people seem to have different ideas of what should 
happen: I would expect the minimum of the array, others expect a 
reduction. Generally I guess it was a wise decision not to change the 
meaning of a function depending on wether it has one or two arguments.

The sum() function is an alias to add.reduce. there are a few more of 
these aliases (i.e. product). I would still say that it is a bit 
unfortunate, since not everybody may immediately realize that these 
functions are in fact reductions.

I wonder if one would not be better of without these functions at all, 
after all you can access the functionality through .reduce(). If you 
mind the extra typing, just define your own alias. Can't we shift them 
into numarray.numeric? Just a thought...

In any case, clearly these functions need to stay around as they are 
for compatibility reasons. It is far more productive to add the 
functionality that a few people already proposed: allow reductions over 
multiple axes. I would welcome that, I always found 1D reductions a bit 
limited anyway. Obviously you can do sequential 1D reductions, but that 
can be quite inefficient. As proposed, the axis argument would take 
maybe a list of dimensions, and 'all' or None. I would like to propose 
an additional possibility: like minimum.reduce(), we could have a 
minimum.all() function that reduces over all dimensions (with a 
potentially much more efficient implementation.) We don't need a 
sum_all(a1) then, you would use add.all(a1). I guess this would be 
easily prototyped using sequential reductions, one can worry  about 
efficiency later.

Sorry for the long story...

Cheers, Peter