Re: [Numpy-discussion] Counting array elements
![](https://secure.gravatar.com/avatar/5dde29b54a3f1b76b2541d0a4a9b232c.jpg?s=120&d=mm&r=g)
Stephen Walton wrote:
There is a difference between the sum() Ufunc and the sum() method which is not mentioned in the documentation: the function works along an axis, while the method works on the whole array. That is, A.sum() and A.flat.sum() are equivalent regardless of the rank of A.
Bummer. I was hoping this was a move to a more object-oriented style, rather than different functionality. Also, it's pretty confusing terminology, particularly if it's not documented! Why not .SumAll() or something? -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
![](https://secure.gravatar.com/avatar/ad21d909c0ffcff2c377c7ee67aba291.jpg?s=120&d=mm&r=g)
At 11:02 AM -0700 2004-10-22, Chris Barker wrote:
Stephen Walton wrote:
There is a difference between the sum() Ufunc and the sum() method which is not mentioned in the documentation: the function works along an axis, while the method works on the whole array. That is, A.sum() and A.flat.sum() are equivalent regardless of the rank of A.
Bummer. I was hoping this was a move to a more object-oriented style, rather than different functionality. Also, it's pretty confusing terminology, particularly if it's not documented! Why not .SumAll() or something?
I agree. Numarray is already confusing enough without identically named functions and methods that do different things. (nElements and size are another pet peeve, with size used in several places and nElements appearing exactly once. Though I am grateful to whoever added size as a workalike for nElements; formerly you had to know what kind of array you had before you knew how to find out how many elements it had.) -- Russell
![](https://secure.gravatar.com/avatar/faf9400121dca9940496a7473b1d8179.jpg?s=120&d=mm&r=g)
On Fri, 2004-10-22 at 14:17, Russell E Owen wrote:
At 11:02 AM -0700 2004-10-22, Chris Barker wrote:
Stephen Walton wrote:
There is a difference between the sum() Ufunc and the sum() method which is not mentioned in the documentation: the function works along an axis, while the method works on the whole array. That is, A.sum() and A.flat.sum() are equivalent regardless of the rank of A.
Bummer. I was hoping this was a move to a more object-oriented style, rather than different functionality. Also, it's pretty confusing terminology, particularly if it's not documented! Why not .SumAll() or something?
sumAll() would certainly be better. Unless there are objections, I'll rename the current sum() method to sumAll() and re-write sum() to give a deprecation warning before calling sumAll(). Eventually, it'll go away altogether. I reviewed the discussion of the sum() result type from a year ago: "[Numpy-discussion] sum and mean methods behaviour". We discussed sum() in depth and AFIK I implemented the recommendations. The results need to be documented. By default, sum() now uses the maximum type of the type family of the array, so families Bool, Integer, UnsignedInteger, Float, or Complex result in max types Bool, Int64, UInt64, Float64, Complex64. I'm not sure why we segregated Bool and it looks like a mistake to me now. I'm thinking the Bool "family" should just go away and be re-classified as UnsignedInteger. These ideas are captured by the numerictypes.MaximumType() function which is also potentially useful for any reduction.
I agree. Numarray is already confusing enough without identically named functions and methods that do different things.
True enough. This'll be fixed.
(nElements and size are another pet peeve, with size used in several places and nElements appearing exactly once. Though I am grateful to whoever added size as a workalike for nElements; formerly you had to know what kind of array you had before you knew how to find out how many elements it had.)
I'm not sure what you mean here. When I grepped, I got 52 hits for nelements() in the numarray source, let alone what users have done with it. Right now, IMHO, it's not clearly broken and there are bigger fish to fry. Regards, Todd
![](https://secure.gravatar.com/avatar/ad21d909c0ffcff2c377c7ee67aba291.jpg?s=120&d=mm&r=g)
At 5:17 PM -0400 2004-10-22, Todd Miller wrote:
On Fri, 2004-10-22 at 14:17, Russell E Owen wrote:
I agree. Numarray is already confusing enough without identically named functions and methods that do different things.
True enough. This'll be fixed.
Great!
(nElements and size are another pet peeve, with size used in several places and nElements appearing exactly once. Though I am grateful to whoever added size as a workalike for nElements; formerly you had to know what kind of array you had before you knew how to find out how many elements it had.)
I'm not sure what you mean here. When I grepped, I got 52 hits for nelements() in the numarray source, let alone what users have done with it. Right now, IMHO, it's not clearly broken and there are bigger fish to fry.
Since you ask... I'm counting the number of implementations in the public interface of the numarray package. There are four implementations of size (including the numarray array method, which is simply a synonym for nelements), but only one implementation of nelements. When I started using numarray, the following was true: * numarray had a function named size. * numarray.ma had the same function * numarray.ma arrays had method size * All of these worked the same way: size(array, axis=None) size returns the number of elements in an array or along the specified axis. BUT numarray arrays had no method size. Instead there was a method nelements, which did the same thing as size, but had no "axis" argument. This was very confusing, and I got tripped up badly because I was trying to count array elements and was using both "normal" numarray arrays and masked arrays. I filed PR 934514 and some kind soul patched the problem by making size a synonym for nelements. There is a bit of residual mess because the new size does not have the axis argument. And then there's the historical clutter of two ways to do the same thing, but presumably one just lives with that. Though it seems a bit strange to me not to deprecate nelements and stop using it internally. -- Russell
![](https://secure.gravatar.com/avatar/5a7d8a4d756bb1f1b2ea729a7e5dcbce.jpg?s=120&d=mm&r=g)
Todd Miller wrote:
sumAll() would certainly be better.
Unless there are objections, I'll rename the current sum() method to sumAll() and re-write sum() to give a deprecation warning before calling sumAll(). Eventually, it'll go away altogether.
silly, minor nit: can we avoid mixed case names? Either sum_all or SumAll? I'm not too fond of CamelCase, but camelCase looks even worse to me :) As I said, it's just a minor nit. I don't know if there's an official naming policy for numarray, so please don't get angry at me if my comment is out of place. Best, f
![](https://secure.gravatar.com/avatar/80473ff660f57aa7f90affadd2240008.jpg?s=120&d=mm&r=g)
On Fri, 2004-10-22 at 14:47, Fernando Perez wrote:
silly, minor nit: can we avoid mixed case names? Either sum_all or SumAll? I'm not too fond of CamelCase, but camelCase looks even worse to me :)
I agree with Fernando about CamelCase (which among other things seriously bites one when moving from case-sensitive to case-insensitive OS's). But I want to make a broader point: I don't think we need sumall. The methods and the functions should simply work the same way. If one wants sumall, use A.flat.sum() or, if you can't use the methods or attributes on your old version of Python, sum(ravel(A)). If you start writing sumall, then you'll need meanall, stdall, prodall, etc, etc. The flat attribute and ravel function/method already provide all the needed functionality. Just trying to save Todd some work. Steve
![](https://secure.gravatar.com/avatar/ba366a43ea0322ddb4cf2462f8ad2596.jpg?s=120&d=mm&r=g)
On 25 Oct 2004, at 04:17, Stephen Walton wrote:
On Fri, 2004-10-22 at 14:47, Fernando Perez wrote:
silly, minor nit: can we avoid mixed case names? Either sum_all or SumAll? I'm not too fond of CamelCase, but camelCase looks even worse to me :)
I agree with Fernando about CamelCase (which among other things seriously bites one when moving from case-sensitive to case-insensitive OS's). But I want to make a broader point:
I don't think we need sumall. The methods and the functions should simply work the same way. If one wants sumall, use A.flat.sum() or, if you can't use the methods or attributes on your old version of Python, sum(ravel(A)). If you start writing sumall, then you'll need meanall, stdall, prodall, etc, etc. The flat attribute and ravel function/method already provide all the needed functionality.
I think this may be inefficient, because ravel and flat may make a copy of the data. Also I think using flat/ravel in such a way is plain ugly and a complex way to do it. But I do agree that it is not a good idea to introduce another set of names. In my opinion functions that calculate a statistic like sum should return the total in the first place, rather then over a single axis. But I guess it is too late to change that for sum, because of backward compatibility. Cheers, Peter
![](https://secure.gravatar.com/avatar/80473ff660f57aa7f90affadd2240008.jpg?s=120&d=mm&r=g)
On Mon, 2004-10-25 at 10:26 +0200, Peter Verveer wrote:
On 25 Oct 2004, at 04:17, Stephen Walton wrote:
I don't think we need sumall. The methods and the functions should simply work the same way. If one wants sumall, use A.flat.sum() or, if you can't use the methods or attributes on your old version of Python, sum(ravel(A)).
I think this may be inefficient, because ravel and flat may make a copy of the data. Also I think using flat/ravel in such a way is plain ugly and a complex way to do it.
You may be right about the copying, I couldn't say. I don't think sum(ravel(A)) looks any worse than sum(sum(sum(A))) for a rank 3 array, but ugly is in the eye of the beholder.
In my opinion functions that calculate a statistic like sum should return the total in the first place, rather then over a single axis.
It depends on the data. I use rank-2 arrays which are images and are therefore homogeneous. Even there, though, I often want the sum of all rows or all columns. For heterogeneous data (e.g., columns of different Y's as a function of X), the present sum() makes sense. In other words, we will always need ways to sum over just one dimension and over all dimensions. By analogy with MATLAB (I'm guessing), sum() in Numeric and numarray does a one-D sum. -- Stephen Walton, Professor of Physics and Astronomy, California State University, Northridge stephen.walton@csun.edu
![](https://secure.gravatar.com/avatar/55f7acf47233a7a98f5eb9dfd0b2d763.jpg?s=120&d=mm&r=g)
Stephen Walton wrote:
On Mon, 2004-10-25 at 10:26 +0200, Peter Verveer wrote:
On 25 Oct 2004, at 04:17, Stephen Walton wrote:
I don't think we need sumall. The methods and the functions should simply work the same way. If one wants sumall, use A.flat.sum() or, if you can't use the methods or attributes on your old version of Python, sum(ravel(A)).
I think this may be inefficient, because ravel and flat may make a copy of the data. Also I think using flat/ravel in such a way is plain ugly and a complex way to do it.
You may be right about the copying, I couldn't say. I don't think sum(ravel(A)) looks any worse than sum(sum(sum(A))) for a rank 3 array, but ugly is in the eye of the beholder.
I'm not sure how feasible it is, but I'd much rather an efficient, non-copying, 1-D view of an noncontiguous array (from an enhanced version of flat or ravel or whatever) than a bunch of extra methods. The former allows all of the standard methods to just work efficiently using sum(ravel(A)) or sum(A.flat) [ and max and min, etc]. Making special whole array methods for everything just leads to method eplosion. -tim
In my opinion functions that calculate a statistic like sum should return the total in the first place, rather then over a single axis.
It depends on the data. I use rank-2 arrays which are images and are therefore homogeneous. Even there, though, I often want the sum of all rows or all columns. For heterogeneous data (e.g., columns of different Y's as a function of X), the present sum() makes sense. In other words, we will always need ways to sum over just one dimension and over all dimensions. By analogy with MATLAB (I'm guessing), sum() in Numeric and numarray does a one-D sum.
![](https://secure.gravatar.com/avatar/7d25e66cab04d869b99bf41281f11d07.jpg?s=120&d=mm&r=g)
I'm not sure how feasible it is, but I'd much rather an efficient, non-copying, 1-D view of an noncontiguous array (from an enhanced version of flat or ravel or whatever) than a bunch of extra methods. The former allows all of the standard methods to just work efficiently using sum(ravel(A)) or sum(A.flat) [ and max and min, etc]. Making special whole array methods for everything just leads to method eplosion.
I completely agree with this ... an efficient flat/ravel would seem to solve many of the issues being raised. Forgive the potentially naive question here, but is there any reason such an efficient, enhanced view can't be implemented for the .flat method? I like the concept of .flat, but I regularly call functions with arguments that may-or-may-not be contiguous. For robustness, such functions _must_ be coded with ravel() because .flat fails for non-contiguous arrays. I never fully understood why there were two ways of "flattening" in the first place. Gary -------------------------------------------------------------- Gary Strangman, PhD | Director, Neural Systems Group Office: 617-724-0662 | Massachusetts General Hospital Fax: 617-726-4078 | 149 13th Street, Ste 10018 | Charlestown, MA 02129
![](https://secure.gravatar.com/avatar/ba366a43ea0322ddb4cf2462f8ad2596.jpg?s=120&d=mm&r=g)
On 25 Oct 2004, at 18:51, Gary Strangman wrote:
I'm not sure how feasible it is, but I'd much rather an efficient, non-copying, 1-D view of an noncontiguous array (from an enhanced version of flat or ravel or whatever) than a bunch of extra methods. The former allows all of the standard methods to just work efficiently using sum(ravel(A)) or sum(A.flat) [ and max and min, etc]. Making special whole array methods for everything just leads to method eplosion.
I completely agree with this ... an efficient flat/ravel would seem to solve many of the issues being raised. Forgive the potentially naive question here, but is there any reason such an efficient, enhanced view can't be implemented for the .flat method?
I believe it is not possible without copying data. The strides between elements of a noncontiguous array are not always the same, so you cannot efficiently view it as a 1D array.
I like the concept of .flat, but I regularly call functions with arguments that may-or-may-not be contiguous. For robustness, such functions _must_ be coded with ravel() because .flat fails for non-contiguous arrays.
Functions should be coded in the first place to take multi-dimensional nature into account in my opinion. One of the points of numarray is that it is multi-dimensional. If a function can work over multiple dimensions, but it only works for 1D arrays, it is broken in my opinion. In my opinion sum() _is_ broken, and introducing a separate sum_all() is an ugly hack.
I never fully understood why there were two ways of "flattening" in the first place.
I suppose it is for efficiency reasons, flat may not always works, but if it does, it is efficient since it would not need to copy any data. Peter
![](https://secure.gravatar.com/avatar/ad21d909c0ffcff2c377c7ee67aba291.jpg?s=120&d=mm&r=g)
At 7:08 PM +0200 2004-10-25, Peter Verveer wrote:
On 25 Oct 2004, at 18:51, Gary Strangman wrote:
I'm not sure how feasible it is, but I'd much rather an efficient, non-copying, 1-D view of an noncontiguous array (from an enhanced version of flat or ravel or whatever) than a bunch of extra methods. The former allows all of the standard methods to just work efficiently using sum(ravel(A)) or sum(A.flat) [ and max and min, etc]. Making special whole array methods for everything just leads to method eplosion.
I completely agree with this ... an efficient flat/ravel would seem to solve many of the issues being raised. Forgive the potentially naive question here, but is there any reason such an efficient, enhanced view can't be implemented for the .flat method?
I believe it is not possible without copying data. The strides between elements of a noncontiguous array are not always the same, so you cannot efficiently view it as a 1D array.
How about providing an iterator that counts through all the elements of an array (e.g. arr.itervalues()). So long as C extensions could efficiently make use of such an iterator, I think it'd do the job. One could also imagine: - arr.iteritems(), which returned (index, value) for each item - a mask argument: a boolean array the same shape as the data array; True means elide the corresponding value from the data array - general support for indexing More generally, I agree that sum should work the same as a function and a method, and that an extra axis argument could be a good thing (it is so common elsewhere, e.g. size). I'd be tempted to break backwards compatibility to fix this, since numarray is still new and the current situation is very confusing. -- Russell
![](https://secure.gravatar.com/avatar/ba366a43ea0322ddb4cf2462f8ad2596.jpg?s=120&d=mm&r=g)
On 25 Oct 2004, at 19:32, Russell E Owen wrote:
At 7:08 PM +0200 2004-10-25, Peter Verveer wrote:
On 25 Oct 2004, at 18:51, Gary Strangman wrote:
I'm not sure how feasible it is, but I'd much rather an efficient, non-copying, 1-D view of an noncontiguous array (from an enhanced version of flat or ravel or whatever) than a bunch of extra methods. The former allows all of the standard methods to just work efficiently using sum(ravel(A)) or sum(A.flat) [ and max and min, etc]. Making special whole array methods for everything just leads to method eplosion.
I completely agree with this ... an efficient flat/ravel would seem to solve many of the issues being raised. Forgive the potentially naive question here, but is there any reason such an efficient, enhanced view can't be implemented for the .flat method?
I believe it is not possible without copying data. The strides between elements of a noncontiguous array are not always the same, so you cannot efficiently view it as a 1D array.
How about providing an iterator that counts through all the elements of an array (e.g. arr.itervalues()). So long as C extensions could efficiently make use of such an iterator, I think it'd do the job.
It would still be slower, because you would need a function call at each element that returns a value. Not a problem if you do a lot of work at each element, but if you are just adding values you want a custom written C function. You can do it a the C level with macros or so, (I do that in nd_image) but that would not help at the python level.
One could also imagine: - arr.iteritems(), which returned (index, value) for each item - a mask argument: a boolean array the same shape as the data array; True means elide the corresponding value from the data array - general support for indexing
Essentially you are suggesting to expose iterators at the python level that iterate over an array in some predefined way. That is possible, but I doubt it will be efficient. At the C level however, it might be worth thinking about as a way of easing writing functions in C. I proposed to do it the other way around in an earlier mail: providing a set of generic functions that take a python or a C function to be applied at each element. I most likely will implement something in that direction, but I should give your idea also some thought.
More generally, I agree that sum should work the same as a function and a method, and that an extra axis argument could be a good thing (it is so common elsewhere, e.g. size). I'd be tempted to break backwards compatibility to fix this, since numarray is still new and the current situation is very confusing.
I would absolutely vote for such a change. Simply because we would like a range of such functions, e.g. minimum, maximum, and so on. Even if we have to leave sum() as it is, I think we should have the alternatives, we would just have to come up with an alternative name for sum(). In fact I would consider volunteering implementing these functions. Peter
![](https://secure.gravatar.com/avatar/55f7acf47233a7a98f5eb9dfd0b2d763.jpg?s=120&d=mm&r=g)
Peter Verveer wrote:
On 25 Oct 2004, at 19:32, Russell E Owen wrote:
At 7:08 PM +0200 2004-10-25, Peter Verveer wrote:
On 25 Oct 2004, at 18:51, Gary Strangman wrote:
I'm not sure how feasible it is, but I'd much rather an efficient, non-copying, 1-D view of an noncontiguous array (from an enhanced version of flat or ravel or whatever) than a bunch of extra methods. The former allows all of the standard methods to just work efficiently using sum(ravel(A)) or sum(A.flat) [ and max and min, etc]. Making special whole array methods for everything just leads to method eplosion.
I completely agree with this ... an efficient flat/ravel would seem to solve many of the issues being raised. Forgive the potentially naive question here, but is there any reason such an efficient, enhanced view can't be implemented for the .flat method?
I believe it is not possible without copying data. The strides between elements of a noncontiguous array are not always the same, so you cannot efficiently view it as a 1D array.
How about providing an iterator that counts through all the elements of an array (e.g. arr.itervalues()). So long as C extensions could efficiently make use of such an iterator, I think it'd do the job.
It would still be slower, because you would need a function call at each element that returns a value. Not a problem if you do a lot of work at each element, but if you are just adding values you want a custom written C function. You can do it a the C level with macros or so, (I do that in nd_image) but that would not help at the python level.
One could also imagine: - arr.iteritems(), which returned (index, value) for each item - a mask argument: a boolean array the same shape as the data array; True means elide the corresponding value from the data array - general support for indexing
Essentially you are suggesting to expose iterators at the python level that iterate over an array in some predefined way. That is possible, but I doubt it will be efficient.
At the C level however, it might be worth thinking about as a way of easing writing functions in C. I proposed to do it the other way around in an earlier mail: providing a set of generic functions that take a python or a C function to be applied at each element. I most likely will implement something in that direction, but I should give your idea also some thought.
More generally, I agree that sum should work the same as a function and a method, and that an extra axis argument could be a good thing (it is so common elsewhere, e.g. size). I'd be tempted to break backwards compatibility to fix this, since numarray is still new and the current situation is very confusing.
I would absolutely vote for such a change. Simply because we would like a range of such functions, e.g. minimum, maximum, and so on. Even if we have to leave sum() as it is, I think we should have the alternatives, we would just have to come up with an alternative name for sum(). In fact I would consider volunteering implementing these functions.
Why the need to break backwards compatability? If one is going to reimplement sum, et al so as to operate on an arbitrary set of axes there's no reason one couldn't maintain the current behaviour as the default. All that is required is to allow axis to be a number (current behaviour), a tuple (reduce across the designated axes) or some special value to sum over all (None?, "all"?). Having two sum functions with different names is not particularly better than the current proposal of a method and a function. -tim
![](https://secure.gravatar.com/avatar/ba366a43ea0322ddb4cf2462f8ad2596.jpg?s=120&d=mm&r=g)
On Oct 25, 2004, at 11:02 PM, Tim Hochberg wrote:
Peter Verveer wrote:
On 25 Oct 2004, at 19:32, Russell E Owen wrote:
At 7:08 PM +0200 2004-10-25, Peter Verveer wrote:
On 25 Oct 2004, at 18:51, Gary Strangman wrote:
I'm not sure how feasible it is, but I'd much rather an efficient, non-copying, 1-D view of an noncontiguous array (from an enhanced version of flat or ravel or whatever) than a bunch of extra methods. The former allows all of the standard methods to just work efficiently using sum(ravel(A)) or sum(A.flat) [ and max and min, etc]. Making special whole array methods for everything just leads to method eplosion.
I completely agree with this ... an efficient flat/ravel would seem to solve many of the issues being raised. Forgive the potentially naive question here, but is there any reason such an efficient, enhanced view can't be implemented for the .flat method?
I believe it is not possible without copying data. The strides between elements of a noncontiguous array are not always the same, so you cannot efficiently view it as a 1D array.
How about providing an iterator that counts through all the elements of an array (e.g. arr.itervalues()). So long as C extensions could efficiently make use of such an iterator, I think it'd do the job.
It would still be slower, because you would need a function call at each element that returns a value. Not a problem if you do a lot of work at each element, but if you are just adding values you want a custom written C function. You can do it a the C level with macros or so, (I do that in nd_image) but that would not help at the python level.
One could also imagine: - arr.iteritems(), which returned (index, value) for each item - a mask argument: a boolean array the same shape as the data array; True means elide the corresponding value from the data array - general support for indexing
Essentially you are suggesting to expose iterators at the python level that iterate over an array in some predefined way. That is possible, but I doubt it will be efficient.
At the C level however, it might be worth thinking about as a way of easing writing functions in C. I proposed to do it the other way around in an earlier mail: providing a set of generic functions that take a python or a C function to be applied at each element. I most likely will implement something in that direction, but I should give your idea also some thought.
More generally, I agree that sum should work the same as a function and a method, and that an extra axis argument could be a good thing (it is so common elsewhere, e.g. size). I'd be tempted to break backwards compatibility to fix this, since numarray is still new and the current situation is very confusing.
I would absolutely vote for such a change. Simply because we would like a range of such functions, e.g. minimum, maximum, and so on. Even if we have to leave sum() as it is, I think we should have the alternatives, we would just have to come up with an alternative name for sum(). In fact I would consider volunteering implementing these functions.
Why the need to break backwards compatability? If one is going to reimplement sum, et al so as to operate on an arbitrary set of axes there's no reason one couldn't maintain the current behaviour as the default.
It seems to me that the behavior one would expect for a function like that, would be to apply the operation to the whole array. Not along an axis. What would you expect as a new user if you call a minimum() function? A single value that is the minimum. So that is the logical choice for the default behavior, I would think.
All that is required is to allow axis to be a number (current behaviour), a tuple (reduce across the designated axes) or some special value to sum over all (None?, "all"?).
Yes, that would be the idea anyway. The question is what should be the default behavior for this type of functions, something I think we should not decide based on the current behavior of a single existing function, but based on what makes the most sense. That is obviously something that can be discussed...
Having two sum functions with different names is not particularly better than the current proposal of a method and a function.
This is certainly true. I would prefer breaking compability... Peter
![](https://secure.gravatar.com/avatar/5dde29b54a3f1b76b2541d0a4a9b232c.jpg?s=120&d=mm&r=g)
Peter Verveer wrote:
On Oct 25, 2004, at 11:02 PM, Tim Hochberg wrote:
Why the need to break backwards compatability? If one is going to reimplement sum, et al so as to operate on an arbitrary set of axes there's no reason one couldn't maintain the current behaviour as the default.
Great idea!
It seems to me that the behavior one would expect for a function like that, would be to apply the operation to the whole array. Not along an axis. What would you expect as a new user if you call a minimum() function? A single value that is the minimum. So that is the logical choice for the default behavior, I would think.
nope. I'd expect it to be along an axis, by default the last one. To me, that's what vectorization is all about. Maybe this is because of my MATLAB (and now Numeric) background, but it makes the most sense to me that a method either returns an array of the same rank, or "reducing" methods return an array of rank reduced by one. Having a method return the same rank answer, no matter the rank of the input, is weird to me. This all depends on how you use arrays. I can see that if you tend to use a 2-d array to store an image, that the single minimum would seem logical, but for many other uses, each dimension has an independent meaning.
Yes, that would be the idea anyway. The question is what should be the default behavior for this type of functions, something I think we should not decide based on the current behavior of a single existing function, but based on what makes the most sense. That is obviously something that can be discussed...
yup, but frankly, this isn't about just one function, it's really about all the reductions: min, max, sum, etc, etc. I think the rule of thumb is not to break backward compatibility unless there is a compelling reason, and given that it's not clear what is most "natural" in this case, keeping the default the same makes the most sense. -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
![](https://secure.gravatar.com/avatar/ba366a43ea0322ddb4cf2462f8ad2596.jpg?s=120&d=mm&r=g)
On Oct 26, 2004, at 6:19 PM, Chris Barker wrote:
Peter Verveer wrote:
It seems to me that the behavior one would expect for a function like that, would be to apply the operation to the whole array. Not along an axis. What would you expect as a new user if you call a minimum() function? A single value that is the minimum. So that is the logical choice for the default behavior, I would think.
nope. I'd expect it to be along an axis, by default the last one.
I still do not agree completely with that, I will elaborate more below, because I also do not agree anymore with my own earlier writings :-). But I see your point that this type of operation can be natural depending on what you are doing. Sometimes a single value does make sense, sometimes not, I think we can agree on that.
Yes, that would be the idea anyway. The question is what should be the default behavior for this type of functions, something I think we should not decide based on the current behavior of a single existing function, but based on what makes the most sense. That is obviously something that can be discussed...
yup, but frankly, this isn't about just one function, it's really about all the reductions: min, max, sum, etc, etc.
Actually no. It seems that sum() is a special case, along with a few others. Again: I elaborate on the general case below.
I think the rule of thumb is not to break backward compatibility unless there is a compelling reason, and given that it's not clear what is most "natural" in this case, keeping the default the same makes the most sense.
I agree. In contrast what I have said before I think we should keep it as it is, for compatibility. Now to elaborate on the general problem, please correct me if I get something wrong. I will use the minimum function as an example and come back to sum() later. If you look at a minimum operation then there are three different things you might like to do: 1) An element by element minimum: minimum(a1, a2). This is the current behaviour. Like all binary ufuncs of this type, it operates on pairs of arrays. So by default it does not do reduction or calculate a single minimum. For most ufuncs that is the natural behavior anyway. 2) A reduction: minimum.reduce(a1). The reduce method of ufuncs is generally used for reductions. Having to use .reduce makes clear what you are doing. Although a bit odd at first sight, I think it is a clever way to overload ufuncs names with different functionality. 3) The minimum of the array: In numarray you do a1.min(). I think in Numeric, you have to do something like minimum.reduce(a1.flat), correct me if I am wrong. Not nice in both cases... Note that calling a binary ufunc with a single argument will give an error: minimum(a1) raises a TypeError. That seems to be a good decision, because people seem to have different ideas of what should happen: I would expect the minimum of the array, others expect a reduction. Generally I guess it was a wise decision not to change the meaning of a function depending on wether it has one or two arguments. The sum() function is an alias to add.reduce. there are a few more of these aliases (i.e. product). I would still say that it is a bit unfortunate, since not everybody may immediately realize that these functions are in fact reductions. I wonder if one would not be better of without these functions at all, after all you can access the functionality through .reduce(). If you mind the extra typing, just define your own alias. Can't we shift them into numarray.numeric? Just a thought... In any case, clearly these functions need to stay around as they are for compatibility reasons. It is far more productive to add the functionality that a few people already proposed: allow reductions over multiple axes. I would welcome that, I always found 1D reductions a bit limited anyway. Obviously you can do sequential 1D reductions, but that can be quite inefficient. As proposed, the axis argument would take maybe a list of dimensions, and 'all' or None. I would like to propose an additional possibility: like minimum.reduce(), we could have a minimum.all() function that reduces over all dimensions (with a potentially much more efficient implementation.) We don't need a sum_all(a1) then, you would use add.all(a1). I guess this would be easily prototyped using sequential reductions, one can worry about efficiency later. Sorry for the long story... Cheers, Peter
![](https://secure.gravatar.com/avatar/7d25e66cab04d869b99bf41281f11d07.jpg?s=120&d=mm&r=g)
I completely agree with this ... an efficient flat/ravel would seem to solve many of the issues being raised. Forgive the potentially naive question here, but is there any reason such an efficient, enhanced view can't be implemented for the .flat method?
I believe it is not possible without copying data. The strides between elements of a noncontiguous array are not always the same, so you cannot efficiently view it as a 1D array.
And it gets even worse for different-stride slices of N-D arrays (though I'm not yet ready to say it's impossible to do without copying). Maybe it's just me, but it does seem somewhat non-pythonic for a function/method to break for an inefficient case, instead of dropping back to less efficient (i.e., copying) behavior.
Functions should be coded in the first place to take multi-dimensional nature into account in my opinion. One of the points of numarray is that it is multi-dimensional. If a function can work over multiple dimensions, but it only works for 1D arrays, it is broken in my opinion. In my opinion sum() _is_ broken, and introducing a separate sum_all() is an ugly hack.
+1. ;-) Hence the thought to make flattening a single "enhanced" method/fcn ... to essentially eliminate the need for such ugly hacks. Typically, my functions accept N-D arguments, and can operate over a user-selected subset of these dimensions. I may pass a whole array, or every other column, or whatever. Judging from the history of this thread, I think a .flat that is as-efficient-as-possible and also robust to all forms of non-contiguity would benefit many, while also reducing the learning-curve issues associated with .flat vs ravel(). As for where/when/how to introduce .newandimprovedflat, welllllll, that's for another thread. ;-) Gary -------------------------------------------------------------- Gary Strangman, PhD | Director, Neural Systems Group Office: 617-724-0662 | Massachusetts General Hospital Fax: 617-726-4078 | 149 13th Street, Ste 10018 | Charlestown, MA 02129
![](https://secure.gravatar.com/avatar/80473ff660f57aa7f90affadd2240008.jpg?s=120&d=mm&r=g)
On Mon, 2004-10-25 at 09:19 -0700, Stephen Walton wrote:
On Mon, 2004-10-25 at 10:26 +0200, Peter Verveer wrote:
I think this may be inefficient, because ravel and flat may make a copy of the data. Also I think using flat/ravel in such a way is plain ugly and a complex way to do it.
You may be right about the copying, I couldn't say.
I just looked at the source (numeric-1.1/Lib/generic.py). The comment to the ravel() function states that it returns a view, not a copy; but it calls reshape() which does make a copy if the input array is not contiguous. I just tested this: A=arange(25,shape=(5,5)) A.transpose() # now A is not contiguous v=ravel(A) A[2,2]=-17 v # verifies that v did not change. So, in the above, it does look like ravel() made a copy, and your fears about inefficiency are warranted. Another test shows that changing ravel(A) to A.flat above also results in a copy. Mayhaps we need sumall() after all. -- Stephen Walton, Professor of Physics and Astronomy, California State University, Northridge stephen.walton@csun.edu
![](https://secure.gravatar.com/avatar/ba366a43ea0322ddb4cf2462f8ad2596.jpg?s=120&d=mm&r=g)
On 25 Oct 2004, at 18:34, Stephen Walton wrote:
On Mon, 2004-10-25 at 09:19 -0700, Stephen Walton wrote:
On Mon, 2004-10-25 at 10:26 +0200, Peter Verveer wrote:
I think this may be inefficient, because ravel and flat may make a copy of the data. Also I think using flat/ravel in such a way is plain ugly and a complex way to do it.
You may be right about the copying, I couldn't say.
I just looked at the source (numeric-1.1/Lib/generic.py). The comment to the ravel() function states that it returns a view, not a copy; but it calls reshape() which does make a copy if the input array is not contiguous. I just tested this:
A=arange(25,shape=(5,5)) A.transpose() # now A is not contiguous v=ravel(A) A[2,2]=-17 v # verifies that v did not change.
So, in the above, it does look like ravel() made a copy, and your fears about inefficiency are warranted. Another test shows that changing ravel(A) to A.flat above also results in a copy. Mayhaps we need sumall() after all.
Yes, we do I guess, but I do not like such things creeping into an otherwise elegant package if I may be frank... Peter
![](https://secure.gravatar.com/avatar/ba366a43ea0322ddb4cf2462f8ad2596.jpg?s=120&d=mm&r=g)
On 25 Oct 2004, at 18:19, Stephen Walton wrote:
On Mon, 2004-10-25 at 10:26 +0200, Peter Verveer wrote:
On 25 Oct 2004, at 04:17, Stephen Walton wrote:
I don't think we need sumall. The methods and the functions should simply work the same way. If one wants sumall, use A.flat.sum() or, if you can't use the methods or attributes on your old version of Python, sum(ravel(A)).
I think this may be inefficient, because ravel and flat may make a copy of the data. Also I think using flat/ravel in such a way is plain ugly and a complex way to do it.
You may be right about the copying, I couldn't say. I don't think sum(ravel(A)) looks any worse than sum(sum(sum(A))) for a rank 3 array, but ugly is in the eye of the beholder.
It does not look worse, I agree with that! But I would argue it should have been sum(A) in the first place to sum over al axes... The sumall would not have been needed, and summing over one (or a sub-set) axis could have been implemented as a an optional argument to sum().
In my opinion functions that calculate a statistic like sum should return the total in the first place, rather then over a single axis.
It depends on the data. I use rank-2 arrays which are images and are therefore homogeneous. Even there, though, I often want the sum of all rows or all columns. For heterogeneous data (e.g., columns of different Y's as a function of X), the present sum() makes sense. In other words, we will always need ways to sum over just one dimension and over all dimensions. By analogy with MATLAB (I'm guessing), sum() in Numeric and numarray does a one-D sum.
I agree it is a useful feature, and it should still be possible to do that using an optional axis argument, even better I would love to be able to sum over several axes in one go, I find the one-dimensional character of reduce limiting, but I digress. In any case, I suppose we will stick with the current behaviour for backwards compatibility. Cheers, Peter
![](https://secure.gravatar.com/avatar/5dde29b54a3f1b76b2541d0a4a9b232c.jpg?s=120&d=mm&r=g)
Todd Miller wrote:
By default, sum() now uses the maximum type of the type family of the array, so families Bool, Integer, UnsignedInteger, Float, or Complex result in max types Bool, Int64, UInt64, Float64, Complex64. I'm not sure why we segregated Bool and it looks like a mistake to me now. I'm thinking the Bool "family" should just go away and be re-classified as UnsignedInteger.
Well, I think that the idea of a bool being different than an int is often useful. In this case, we want Bool to behave like an integer, so that we can use some version of sum() to add up all the true values. This is handy, but maybe we need more complete support for boolean arrays, rather than getting rid of them. For instance, there could be a NumTrue() function or method, for this case. I would probably maintain the easy conversion of a Bool array to an Int array, for when you really do need to do math with them. We'd want a compete set, many of which already exist. A few off the top of my head: sometrue alltrue numtrue Maybe mirrors for false: somefalse allfalse numfalse What else would be needed? My vote would be for all of these to be methods of a Bool array, but I'm partial to methods over functions anyway. On the other hand, Python itself is sub classing Bool from integer, so maybe there's little point. -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
![](https://secure.gravatar.com/avatar/39916bae984cb93b797efd2b175f59c0.jpg?s=120&d=mm&r=g)
More new user feedback ... On Fri, 22 Oct 2004, Chris Barker apparently wrote:
Well, I think that the idea of a bool being different than an int is often useful.
Yes. E.g., applications to directed graphs.
we can use some version of sum() to add up all the true values.
Unclear, but given the existence of sometrue, it seems natural enough to let sum treat a Bool as an integer. Products work naturally, of course.
I would probably maintain the easy conversion of a Bool array to an Int array, for when you really do need to do math with them.
I would rephrase this. Boolean arrays have a naturally different math, which it would be nice to have supported. It would also be nice to easily convert to Int, when that representation captures the math needed.
We'd want a compete set, many of which already exist. A few off the top of my head: sometrue alltrue numtrue
I'd just let sum handle numtrue.
Maybe mirrors for false: somefalse, allfalse, numfalse
I'd just rely on alltrue, sometrue, and (size less sum) for these. fwiw, Alan
![](https://secure.gravatar.com/avatar/39916bae984cb93b797efd2b175f59c0.jpg?s=120&d=mm&r=g)
On 22 Oct 2004, Todd Miller apparently wrote:
sumAll() would certainly be better.
Unless there are objections, I'll rename the current sum() method to sumAll() and re-write sum() to give a deprecation warning before calling sumAll(). Eventually, it'll go away altogether.
Just two thoughts from a new user. i. I agree that .sumAll is better than the current name confusion. ii. even better, I propose, would be for .sum to take an axis argument, with default matching the sum function, and possible value axis="all". For the transition, the axis argument can be required. fwiw, Alan Isaac
![](https://secure.gravatar.com/avatar/39916bae984cb93b797efd2b175f59c0.jpg?s=120&d=mm&r=g)
On Fri, 22 Oct 2004 Alan G Isaac apparently wrote:
Just two thoughts from a new user. i. I agree that .sumAll is better than the current name confusion. ii. even better, I propose, would be for .sum to take an axis argument, with default matching the sum function, and possible value axis="all". For the transition, the axis argument can be required.
That should have been: axis=None fwiw, Alan Isaac
![](https://secure.gravatar.com/avatar/80473ff660f57aa7f90affadd2240008.jpg?s=120&d=mm&r=g)
On Fri, 2004-10-22 at 11:17, Russell E Owen wrote about the sum() Ufunc vs. the sum() method:
Numarray is already confusing enough without identically named functions and methods that do different things
When I went through the Numarray docs and made suggestions for improvements (see the list I posted at Sourceforge), I didn't make any comments about functional changes, only what the documentation said. Since the sum() method is documented using 1-D arrays, you can't tell that it in fact behaves differently than the sum() Ufunc. On reflection, I also agree that the Ufuncs and methods should behave the same way. Why do you say 'numarray is confusing'? What in the docs would help un-confuse it, in your view? -- Stephen Walton <stephen.walton@csun.edu> Dept. of Physics & Astronomy, Cal State Northridge
![](https://secure.gravatar.com/avatar/ad21d909c0ffcff2c377c7ee67aba291.jpg?s=120&d=mm&r=g)
At 2:35 PM -0700 2004-10-22, Stephen Walton wrote:
On Fri, 2004-10-22 at 11:17, Russell E Owen wrote about the sum() Ufunc vs. the sum() method:
Numarray is already confusing enough without identically named functions and methods that do different things
When I went through the Numarray docs and made suggestions for improvements (see the list I posted at Sourceforge), I didn't make any comments about functional changes, only what the documentation said. Since the sum() method is documented using 1-D arrays, you can't tell that it in fact behaves differently than the sum() Ufunc. On reflection, I also agree that the Ufuncs and methods should behave the same way.
Why do you say 'numarray is confusing'? What in the docs would help un-confuse it, in your view?
OK, since I seem to be in a grumpy mood today, here are some examples (probably nothing new here): - I'll expose my ignorance, but I find the take stuff and fancy indexing nearly incomprehensible. I've tried to follow the examples (several times--i.e. every time I need to do something fancy), but generally I either flail around until I find something that works, or give up and write a C extension. - I'd like to write C/C++ code that would work on multiple array types. This seems a natural use of C++ templates, but that doesn't seem to be "how it's done". I hate to think how the internal code is managing this without being a horrible sphaghetti of code repeated for each array type. The nd_image package is the closest I've come to finding source code that makes any sense to me in this areay. But it uses so many custom-defined specialized functions that I figured it was just too much work to figure out w/out a manual (and risky to rely on these functions since they are internal to the package). So I gave up and just support the one data type I really need now. Very disappointing. - Important functions are sometimes buried in a non-obvious (to me) sub-package. For example: try to find that location at which an array has a minimum value (if there's more than one such point, pick any). You'd think it'd be a standard numarray function, wouldn't you? After all, you can ask for the minimum value. Now try to find it. Well, I started out by trying to figure out how to get argmin to do the job. Horrible. Fortunately I finally found minimum_position buried in nd_image. - Masked arrays are not integrated. Thus a lot of important filtering and stuff simply cannot be done on masked data without writing custom extensions. For instance I'd like to do a median-filter that ignores masked data (taking the median of non-masked data only). - For 2-d images x and y are reversed. I know this isn't going to change, but it is a headache every time I have to write new image processing code. - I keep wanting more support for dealing with arrays of indices, e.g. "give me all the indices for which this is true", then use that to process the data in an array. Numarray seems to do that kind of operation in an entirely different way, suggesting I'm not "with it" on the underlying philosophy. Unfortunately no really good examples come to mind at the moment (it's been awhile since I've created new code using numarray), though I was fairly well convinced that if I had enough support for this I could code an efficient radial profile function w/out using a C extension. -- Russell
![](https://secure.gravatar.com/avatar/c7976f03fcae7e1199d28d1c20e34647.jpg?s=120&d=mm&r=g)
Todd and I will be away most of next week at a conference and will likely not have a chance to respond to questions about numarray or continue the current discussions about the proper numarray interface or improvements to the documentation. Perry
![](https://secure.gravatar.com/avatar/80473ff660f57aa7f90affadd2240008.jpg?s=120&d=mm&r=g)
I had no idea my innocent question would generate so much discussion. Mindful that Perry and Todd are at ADASS in Pasadena next week: On Fri, 2004-10-22 at 15:18 -0700, Russell E Owen wrote:
At 2:35 PM -0700 2004-10-22, Stephen Walton wrote:
Why do you say 'numarray is confusing'? What in the docs would help un-confuse it, in your view?
- I'll expose my ignorance, but I find the take stuff and fancy indexing nearly incomprehensible.
I agree. It took me much experimentation to figure out exactly how it worked. I'd appreciate it very much if you would look at my suggested rewrite of this section of the documentation at http://sourceforge.net/tracker/index.php?func=detail&aid=1047889&group_id=1369&atid=101369 and give me any further thoughts for clarification (post them as comments to the bug report itself).
- I'd like to write C/C++ code that would work on multiple array types.
I can't help much here, other than to say that C and C++ are pretty low level languages, not well suited for this level of abstraction.
- Important functions are sometimes buried in a non-obvious (to me) sub-package. For example: try to find that location at which an array has a minimum value
The current index to the documentation seems to include only the function names but not concepts, which is a problem. I myself was trying to remember how to do type conversion; there is no entry in the index for 'conversion' or 'coercion' and I finally grepped my local copy of the HTML files to re-find astype().
- Masked arrays are not integrated.
I haven't tried these yet personally, but I agree that such a feature is a very important one. IRAF got partway along on this but didn't finish it either. Having said that, my workaround/technique for both MATLAB and numarray is to simply put NaN's in the places where this not valid data and do something like sum(sum(A(~isnan(A))) This is MATLAB syntax of course. Something similar in numarray would go a long way to helping me. For example, I have full disk solar images and I'd like to be able to operate on just the sunspot pixels, or just the sky pixels, in a straightforward way.
- For 2-d images x and y are reversed.
Are you referring to the fact that C and numarray are row major and Fortran is column major? Or to how images get displayed in the various plot packages?
- I keep wanting more support for dealing with arrays of indices, e.g. "give me all the indices for which this is true", then use that to process the data in an array. Numarray seems to do that kind of operation in an entirely different way, suggesting I'm not "with it" on the underlying philosophy.
There are two ways to do this, both of which work. For example: A=arange(25) sum(A[A<=7]) will work just as you expect. A bool array used as an index picks out those values for which the bool is True. Essentially identical syntax now works in MATLAB too. If you want an index array instead:
index=where(A<7) A[index]
will do the trick. For arrays of rank greater than 1:
A=arange(25,shape=(5,5)) where(A<7) (array([0, 0, 0, 0, 0, 1, 1]), array([0, 1, 2, 3, 4, 0, 1]))
which is a tuple of two arrays that can be used to index A:
ind1,ind2=where(A<7) A[ind1,ind2] array([0, 1, 2, 3, 4, 5, 6]) A[ind1,ind2]=[6,5,4,3,2,1,0] # assignment works too A array([[ 6, 5, 4, 3, 2], [ 1, 0, 7, 8, 9], [10, 11, 12, 13, 14], [15, 16, 17, 18, 19], [20, 21, 22, 23, 24]])
Does this help? -- Stephen Walton <stephen.walton@csun.edu> Physics & Astronomy CSUN
![](https://secure.gravatar.com/avatar/5dde29b54a3f1b76b2541d0a4a9b232c.jpg?s=120&d=mm&r=g)
A few comments on a number of posts in this thread: Stephen Walton wrote:
- I'd like to write C/C++ code that would work on multiple array types.
I can't help much here, other than to say that C and C++ are pretty low level languages, not well suited for this level of abstraction.
Well, this is certainly true for C, but not so much for C++. I'm not expert, but C++ templates could be very handy here. When the numarray projects was just getting started, there was some discussion about using a template-based array package as the base, perhaps Blitz++. I still this this was a great idea, but I think the biggest issue at the time was that templates were still not constantly well supported by the wide variety of compilers that numarray should work with. Personally I think that anything supported by gcc should be fine, as anyone can use gcc on virtually any platform, if they want. Anyway, it's too late to re-write numarray, but maybe a numarray <--> blitz++ conversion package would make it easy to write numarray extensions with blitz++. Perhaps even integrate it with Boost.Python. Another option would be to write a template-based wrapper around the existing Numarray objects. By the way, my other issue with extensions is the difficulty of writing extensions that support discontinuous arrays, in addition to multiple data types. It seems someone smarter than me could use C++ classes to solve this one as well. Peter Verveer wrote:
But I do agree that it is not a good idea to introduce another set of names. In my opinion functions that calculate a statistic like sum should return the total in the first place, rather then over a single axis.
Absolutely not! I'm far more likely to want it over a single axis, it's the core of "vectorizing" your code. If the data are mean the same thing, why aren't you storing it in a 1-d array? That being said, it should be easy to do various reductions over all axis, which I think .flat() does nicely. I thought .flat() never made a copy: am I wrong? Stephen Walton wrote:
It depends on the data. I use rank-2 arrays which are images and are therefore homogeneous.
OK, good example.... I take back some of what I said above!
By analogy with MATLAB (I'm guessing), sum() in Numeric and numarray does a one-D sum.
except Matab does it worse. If your 2-d array happens to have only one row, you get the sum over that..yecch! Tim Hochberg wrote:
I'm not sure how feasible it is, but I'd much rather an efficient, non-copying, 1-D view of an noncontiguous array (from an enhanced version of flat or ravel or whatever) than a bunch of extra methods. The former allows all of the standard methods to just work efficiently using sum(ravel(A)) or sum(A.flat) [ and max and min, etc]. Making special whole array methods for everything just leads to method eplosion.
here! here! I thought that was exactly what .flat() was for. Shows what I know! -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
![](https://secure.gravatar.com/avatar/ba366a43ea0322ddb4cf2462f8ad2596.jpg?s=120&d=mm&r=g)
Stephen Walton wrote:
- I'd like to write C/C++ code that would work on multiple array types. I can't help much here, other than to say that C and C++ are pretty low level languages, not well suited for this level of abstraction.
Well, this is certainly true for C, but not so much for C++. I'm not expert, but C++ templates could be very handy here. When the numarray projects was just getting started, there was some discussion about using a template-based array package as the base, perhaps Blitz++. I still this this was a great idea, but I think the biggest issue at the time was that templates were still not constantly well supported by the wide variety of compilers that numarray should work with. Personally I think that anything supported by gcc should be fine, as anyone can use gcc on virtually any platform, if they want.
I think having the option of using C++ would be cool. But as soon as we would 'require' it, I would not develop for numarray anymore. C++ is a big pain in my opinion, although I do agree that a well written templating system like Blitz++ is nice if you actually use C++.
Anyway, it's too late to re-write numarray, but maybe a numarray <--> blitz++ conversion package would make it easy to write numarray extensions with blitz++. Perhaps even integrate it with Boost.Python. Another option would be to write a template-based wrapper around the existing Numarray objects.
yes, it would be nice to have the option. There is no reason why there could not be a C++ API which would include the use of templates layered on top of the current C API for those people that would like to use it.
By the way, my other issue with extensions is the difficulty of writing extensions that support discontinuous arrays, in addition to multiple data types. It seems someone smarter than me could use C++ classes to solve this one as well.
I had to deal with that problem too in nd_image. It is doable, albeit ugly if you depend on plain C. Probably C++ could do it differently and more nicely, Blitz++ possible does. Again, not for me.
Peter Verveer wrote:
But I do agree that it is not a good idea to introduce another set of names. In my opinion functions that calculate a statistic like sum should return the total in the first place, rather then over a single axis.
Absolutely not! I'm far more likely to want it over a single axis, it's the core of "vectorizing" your code. If the data are mean the same thing, why aren't you storing it in a 1-d array?
I agree that it is important, I am just saying that both are very common operations. Why not support operations over an axis by a optional argument, you will often have to specify which axis you want anyway.
That being said, it should be easy to do various reductions over all axis, which I think .flat() does nicely. I thought .flat() never made a copy: am I wrong?
Unfortunately, flattening an array is not always possible without copying, due to the fact that arrays may be not contiguous in memory.
Tim Hochberg wrote:
I'm not sure how feasible it is, but I'd much rather an efficient, non-copying, 1-D view of an noncontiguous array (from an enhanced version of flat or ravel or whatever) than a bunch of extra methods. The former allows all of the standard methods to just work efficiently using sum(ravel(A)) or sum(A.flat) [ and max and min, etc]. Making special whole array methods for everything just leads to method eplosion.
here! here! I thought that was exactly what .flat() was for. Shows what I know!
It is however not feasible I think to do it efficiently. It seems to me that a set of functions is necessary to do things like sum, minimum and so on, that work on the whole array. I would also prefer they are not methods. Introducing a whole array of sum_all() like functions is also not great. Cheers, Peter
![](https://secure.gravatar.com/avatar/ba366a43ea0322ddb4cf2462f8ad2596.jpg?s=120&d=mm&r=g)
I thought I just give my point of view on this, since I do believe we should give these some thought. On Oct 23, 2004, at 12:18 AM, Russell E Owen wrote:
OK, since I seem to be in a grumpy mood today, here are some examples (probably nothing new here): - I'll expose my ignorance, but I find the take stuff and fancy indexing nearly incomprehensible. I've tried to follow the examples (several times--i.e. every time I need to do something fancy), but generally I either flail around until I find something that works, or give up and write a C extension.
I agree, it is very complicated, I always have trouble getting understanding what is going on when I use take and indexing. More documentation may help.
- I'd like to write C/C++ code that would work on multiple array types. This seems a natural use of C++ templates, but that doesn't seem to be "how it's done". I hate to think how the internal code is managing this without being a horrible sphaghetti of code repeated for each array type.
This is a good point. If you look at examples for implementing something in C, you always see that the code only handles a single data type, usually converting all input to double type. That is not always a good way to write an extension if you want it to be of generic use (e.g. the FFT module does not handle 32 bits floating point well, which is a problem for big arrays). Some support in writing functions that handle multiple data types would be good.
The nd_image package is the closest I've come to finding source code that makes any sense to me in this areay. But it uses so many custom-defined specialized functions that I figured it was just too much work to figure out w/out a manual (and risky to rely on these functions since they are internal to the package).
The internal nd_image C functions are indeed not exported and should not be used to implement extensions. That is going to stay that way since I do not plan to document these, and in any case, exposing such functions is not the purpose of the module. On the other hand, some of the techniques use may be generally useful. I could try to factor some of the functions and macros out and write something up on the use of these to write extensions that handle multiple data types.
So I gave up and just support the one data type I really need now. Very disappointing.
Yes, it should be easier to do this, I agree. Using C macros as a 'poor man' templating system is in fact not too complicated (although pretty ugly). Another approach that I have tried to use in nd_image is to provide generic functions that take a python or a C function to implement functionality. For instance to implement an arbitrary filter function in nd_image you only need to implement a function that calculates the filter at one point. You then call a generic filter function that does the heavy lifting of dealing with multiple array types, iterating over the array, dealing with borders and such, applying the function at each array element. The filter function can be in python, but can also be a C function, communicated by a CObject. Maybe some of these type functions could be provided with the numarray package. This could simplify writing extensions a lot. Would there be interest for a package of such functions? If there is I could think about it a bit more, and propose (and implement) something in the form of an extension.
- Important functions are sometimes buried in a non-obvious (to me) sub-package.
For example: try to find that location at which an array has a minimum value (if there's more than one such point, pick any). You'd think it'd be a standard numarray function, wouldn't you? After all, you can ask for the minimum value. Now try to find it.
Agreed, this bothered me too.
Well, I started out by trying to figure out how to get argmin to do the job. Horrible.
Fortunately I finally found minimum_position buried in nd_image.
It is there because numarray did not provide it... But it is also there because it offers much functionality that would not be appropriate for the main package. It is part of the object measurement functions. A simpler, possibly more efficient routine should maybe be part of the main package.
- Masked arrays are not integrated. Thus a lot of important filtering and stuff simply cannot be done on masked data without writing custom extensions. For instance I'd like to do a median-filter that ignores masked data (taking the median of non-masked data only).
I agree very much! To be honest, I do not like the ma package much. I don't like the idea of having to use a separate package with a different array type that duplicates the functionality in the main package. I think it would be much better if all functions (where it makes sense) in numarray would accept an optional mask argument. To me it makes more sense to provide the mask with the operation, not as part of the array like in ma (a package like ma could still be layered on top.) I realize it would be a lot of work to make all numarray functions mask aware, but it is something to think about maybe.
- For 2-d images x and y are reversed. I know this isn't going to change, but it is a headache every time I have to write new image processing code.
This is not really a problem I think, but you have to get used to it. If you treat the last dimension always as X and the first as Y, you have the same layout in memory as is usual in most image processing software. So X corresponds to axis=1 and Y to axis=0. Or use axis=-1 and axis=-2. Cheers, Peter
participants (10)
-
Alan G Isaac
-
Chris Barker
-
Fernando Perez
-
Gary Strangman
-
Perry Greenfield
-
Peter Verveer
-
Russell E Owen
-
Stephen Walton
-
Tim Hochberg
-
Todd Miller