On Oct 26, 2004, at 6:19 PM, Chris Barker wrote:
Peter Verveer wrote:
It seems to me that the behavior one would expect for a function like that, would be to apply the operation to the whole array. Not along an axis. What would you expect as a new user if you call a minimum() function? A single value that is the minimum. So that is the logical choice for the default behavior, I would think.
nope. I'd expect it to be along an axis, by default the last one.
I still do not agree completely with that, I will elaborate more below, because I also do not agree anymore with my own earlier writings :-). But I see your point that this type of operation can be natural depending on what you are doing. Sometimes a single value does make sense, sometimes not, I think we can agree on that.
Yes, that would be the idea anyway. The question is what should be the default behavior for this type of functions, something I think we should not decide based on the current behavior of a single existing function, but based on what makes the most sense. That is obviously something that can be discussed...
yup, but frankly, this isn't about just one function, it's really about all the reductions: min, max, sum, etc, etc.
Actually no. It seems that sum() is a special case, along with a few others. Again: I elaborate on the general case below.
I think the rule of thumb is not to break backward compatibility unless there is a compelling reason, and given that it's not clear what is most "natural" in this case, keeping the default the same makes the most sense.
I agree. In contrast what I have said before I think we should keep it as it is, for compatibility. Now to elaborate on the general problem, please correct me if I get something wrong. I will use the minimum function as an example and come back to sum() later. If you look at a minimum operation then there are three different things you might like to do: 1) An element by element minimum: minimum(a1, a2). This is the current behaviour. Like all binary ufuncs of this type, it operates on pairs of arrays. So by default it does not do reduction or calculate a single minimum. For most ufuncs that is the natural behavior anyway. 2) A reduction: minimum.reduce(a1). The reduce method of ufuncs is generally used for reductions. Having to use .reduce makes clear what you are doing. Although a bit odd at first sight, I think it is a clever way to overload ufuncs names with different functionality. 3) The minimum of the array: In numarray you do a1.min(). I think in Numeric, you have to do something like minimum.reduce(a1.flat), correct me if I am wrong. Not nice in both cases... Note that calling a binary ufunc with a single argument will give an error: minimum(a1) raises a TypeError. That seems to be a good decision, because people seem to have different ideas of what should happen: I would expect the minimum of the array, others expect a reduction. Generally I guess it was a wise decision not to change the meaning of a function depending on wether it has one or two arguments. The sum() function is an alias to add.reduce. there are a few more of these aliases (i.e. product). I would still say that it is a bit unfortunate, since not everybody may immediately realize that these functions are in fact reductions. I wonder if one would not be better of without these functions at all, after all you can access the functionality through .reduce(). If you mind the extra typing, just define your own alias. Can't we shift them into numarray.numeric? Just a thought... In any case, clearly these functions need to stay around as they are for compatibility reasons. It is far more productive to add the functionality that a few people already proposed: allow reductions over multiple axes. I would welcome that, I always found 1D reductions a bit limited anyway. Obviously you can do sequential 1D reductions, but that can be quite inefficient. As proposed, the axis argument would take maybe a list of dimensions, and 'all' or None. I would like to propose an additional possibility: like minimum.reduce(), we could have a minimum.all() function that reduces over all dimensions (with a potentially much more efficient implementation.) We don't need a sum_all(a1) then, you would use add.all(a1). I guess this would be easily prototyped using sequential reductions, one can worry about efficiency later. Sorry for the long story... Cheers, Peter