Re: [Numpy-discussion] Counting array elements
I have returned from our astronomical data systems conference and I am going to take a short cut and summarize what I saw as the key developments of this thread. I apologize for not responding sooner and individually but the web-mail system I use isn't effective for conducting any kind of discussion. You guys did a great job sorting this out this week. I marked my key points with **. The rest is probably only for people with a lot of patience. ** I've finally come to terms with the fact that functions are the right way to do numarray rather than methods. The arguments in the Numeric manual are no more persuasive now than they ever were, but Stephen Walton's remarks about method explosion finally convinced me what the "real" reason for doing functions is that using methods combines every new feature under the umbrella of a single namespace, the NumArray class. Using functions lets us partition things into modules which can be used selectively and makes a more extensible and understandable system. Thanks Stephen. A couple people remarked that using .flat might solve everything with something like a.flat.sum() or sum(ravel(a). This gets to the original motivation for the sum() method, which was the codification of a simple and storage efficient technique for reducing noncontiguous arrays. The first point is that a non-contiguous array cannot generally be reshaped without making a copy. The basic idea of the sum() method is to do *two* reductions, the first, along a single axis, results in a smaller contiguous array. In the case of astronomical images which are generally square or at least non-degenerate, the reduction result is a *much* smaller array. The second reduction handles all the remaining dimensions since .flat is guaranteed to work because the array is contiguous. The end result is a complete sum() without righting additional ufuncs or making an array copy. There was understandable confusion about why .flat is sometimes allowed to fail. Since it is an attribute, we thought it inappropriate to make it return a copy of the source array and chose instead to raise an exception. In contrast, it is reasonable for the ravel() function to return a completely different array, so it always works. (I just noticed that ravel() is not named flat()). Some of our more contemporary thinkers suggested using iterators to produce a .flat which always works. If anyone has an idea how to make this work with good performance, please let me know; I don't. ** Tim Hochberg pointed out that we can overload the reduction (and not accumulation?) axis parameter with an "all" or a tuple describing a sequence of axes to reduce along. My perception was that there was a consensus behind this and in any case I'm in agreement with Tim. Alan Isaac pointed out that None might be better here than "all" and I agree. At this point, I think sumAll() is dead, the sum() method will be deprecated, and the reductions should be expanded as Tim suggested. ** Peter Verveer made some comments about the expectations of a naive user regarding reductions, namely that "all" should be the default. My own experience bears this out, and I am torn about what to do here. Chris Barker pointed out the need for backward compatibility with Numeric, and given the current numarray goal of supporting SciPy, this need is growing stronger and more complex. SciPy uses yet another axis convention. If anyone has any ideas how to handle these multiple conventions with elegance, let me know. A number of people commented on our naming conventions, an issue which we have side stepped for the moment with sumAll(). My impression is that, for better or worse, numarray uses the lowerUpper() version of Camel case. I think this is very much a matter of personal taste and don't claim to have any. My guess is that numarray is probably inconsistent at the moment, in part because lowerUpper() often degenerates into merely lower() which degenerates into confusion. Regards, Todd
** Peter Verveer made some comments about the expectations of a naive user regarding reductions, namely that "all" should be the default. My own experience bears this out, and I am torn about what to do here. Chris Barker pointed out the need for backward compatibility with Numeric, and given the current numarray goal of supporting SciPy, this need is growing stronger and more complex. SciPy uses yet another axis convention. If anyone has any ideas how to handle these multiple conventions with elegance, let me know.
Numarray should probably be either completely compatible in every small detail, or we could take the opportunity to change what we believe was the wrong choice. Not sure what is really best, although personally feel breaking compatibility is fine if the result is better. Is there not already a sub-package numeric within numarray that provides Numeric compatibility? Such a package could at least provide wrappers with compatible behavior for people who need that. Peter
Peter Verveer wrote:
Numarray should probably be either completely compatible in every small detail, or we could take the opportunity to change what we believe was
Well, as I mentioned before having numarray match Numeric in every small detail is not going to happen (and even there, which flavor? the original Numeric or the scipy version?). We've been pretty clear about where incompatibilities were deliberate. But on the other hand, that leaves many other choices that could be revisited if enough people support them. The problem is that no matter what is done, I suspect some people are going to be inconvenienced since there is already (without numarray) a split in the community because of scipy.
the wrong choice. Not sure what is really best, although personally feel breaking compatibility is fine if the result is better. Is there not already a sub-package numeric within numarray that provides Numeric compatibility? Such a package could at least provide wrappers with compatible behavior for people who need that.
At the moment the numeric module provides more Numeric compatibility (but not complete). In matplotlib we use a module called numerix to provide a uniform interface to both Numeric and numerix (along with prohibitions on use of certain features that don't exist in the other). We are looking at scipy_base now that undoubtably will highlight similar cases where we will suggest internal reorganization to do the same sort of thing that was done for matplotlib. For those that intend to use numarray only now and forever, one is free to use all the features they desire. But there still is the behavior issue of those things that are currently incompatible like the axis issue. Perry
On Sun, 2004-10-31 at 12:31, Perry Greenfield wrote: [SNIP]
the wrong choice. Not sure what is really best, although personally feel breaking compatibility is fine if the result is better. Is there not already a sub-package numeric within numarray that provides Numeric compatibility? Such a package could at least provide wrappers with compatible behavior for people who need that.
At the moment the numeric module provides more Numeric compatibility (but not complete). In matplotlib we use a module called numerix to provide a uniform interface to both Numeric and numerix (along with
In anyone was scratching their head, I think Perry meant to say "both Numeric and numarray" here. In particular, matplotlib's array package proxy, numerix, uses numarray.numeric and some of the add-ons to supply Numeric-like functionality. Because numarray.numeric is Numeric-like, it is actually a subset of numarray, with differences in put(), take(), and nonzero() among other things. On the plus side, the simpler numeric put(), take(), and nonzero() are ports (can't get more compatible... any difference is a bug) and pure C (so they're faster for small arrays... but still slower than Numeric). Regards, Todd
Todd Miller wrote: [SNIP]
** Tim Hochberg pointed out that we can overload the reduction (and not accumulation?)
It seems possible. It's probably marginally useful at best. However, it might be worth doing if not too painful, just so that the accumulate and reduce signatures match.
axis parameter with an "all" or a tuple describing a sequence of axes to reduce along. My perception was that there was a consensus behind this and in any case I'm in agreement with Tim. Alan Isaac pointed out that None might be better here than "all" and I agree.
Using None to mean ALL seems a little perverse to me, but I'll grant that using an existing singleton makes things simpler. I'll just point out that it would also be possible to define an ALL singleton and use that. Very tangential: it's too bad that '...' can't be typed more places: the natural spelling for ALL is [...] as in: add.reduce(a, axis=[...]) Sadly, that won't work.
At this point, I think sumAll() is dead, the sum() method will be deprecated, and the reductions should be expanded as Tim suggested.
** Peter Verveer made some comments about the expectations of a naive user regarding reductions, namely that "all" should be the default. My own experience bears this out, and I am torn about what to do here.
I suspect that one's experience here depends on your typical problem domain. If one does a lot 2D work ALL would seem to be the natural choice. If you use a lot of arrays of vectors, as I do, -1 is the natural choice. At this point I can't recall a case where ALL would have been the natural choice for me. In addition to backwards compatibility, one argument for not using ALL as the default is that it makes little sense or no sense for accumulate. Having the default for reduce be ALL, but that for accumulate be -1 (for instance) would be confusing.
Chris Barker pointed out the need for backward compatibility with Numeric,
I'd think that the importance of backward compatibility with not just Numeric, but with Numarray itself has been underrated. Changing the default for reduce / sum is a particularly insiduous since many uses will fail silently, producing the wrong answer, but continuing to run. This means that all instances of sum, product and reduce will need to be inspected and corrected. Having 10k LOC that use Numarray, I'll be a bit irked if this gets changed without a better justification than what I've seen thus far.
and given the current numarray goal of supporting SciPy, this need is growing stronger and more complex. SciPy uses yet another axis convention. If anyone has any ideas how to handle these multiple conventions with elegance, let me know.
Could you describe the SciPy axis convention: I'm not familiar with it. [SNIP] -tim
Todd Miller wrote:
There was understandable confusion about why .flat is sometimes allowed to fail. Since it is an attribute, we thought it inappropriate to make it return a copy of the source array and chose instead to raise an exception. In contrast, it is reasonable for the ravel() function to return a completely different array, so it always works. (I just noticed that ravel() is not named flat()). Some of our more contemporary thinkers suggested using iterators to produce a .flat which always works. If anyone has an idea how to make this work with good performance, please let me know; I don't.
This aspect of flat can be considered a wart. There are three different desired behaviors depending on who you talk to. For efficiency reasons, some only want flat (and even ravel) to work if the array is already contiguous; that is, they don't want copies unless they ask for them. Others want it to always work, producing a copy if necessary but otherwise for it to return a view. Yet others always want a copy. So, are three different versions needed? Or options to a function? The drawback of .flat (as an attribute) is there is only one choice for behavior. For a function (or a method) we could modify the behavior with a keyword argument. Personally, I would rather .flat always work, even if it means returning a copy. Is there any consensus on how this problem should be handled?
** Peter Verveer made some comments about the expectations of a naive user regarding reductions, namely that "all" should be the default. My own experience bears this out, and I am torn about what to do here. Chris Barker pointed out the need for backward compatibility with Numeric, and given the current numarray goal of supporting SciPy, this need is growing stronger and more complex. SciPy uses yet another axis convention. If anyone has any ideas how to handle these multiple conventions with elegance, let me know.
I find this issue particularly vexing as well. Let's be clear about this, scipy changes the behavior of Numeric to produce a new flavor. What should numarray do? Follow the scipy behavior or the Numeric behavior? Or should there be a scipy/numarray flavor vs the more Numeric compatible numarray? Note, we never intended numarray to be 100% compatible with Numeric since there were aspects we thought should be changed (e.g., scalar/array type coercions). Yet there appear to be two camps of the Numeric community. Some sort of survey may be in order here. Is scipy where all the new growth is now? Should we just adopt the axis convention used there? I'd very much prefer not proliferate any more flavors of behavior and just settle on one.
A number of people commented on our naming conventions, an issue which we have side stepped for the moment with sumAll(). My impression is that, for better or worse, numarray uses the lowerUpper() version of Camel case. I think this is very much a matter of personal taste and don't claim to have any. My guess is that numarray is probably inconsistent at the moment, in part because lowerUpper() often degenerates into merely lower() which degenerates into confusion.
How much of the public interface uses camelCase? I don't think all that much if any. It seems to me the inclination of scipy is to avoid it and I'm happy with that. The internal implementation is a different issue, and there I think Todd is right that it probably is somewhat inconsistent on that front. Perry
Perry Greenfield wrote:
This aspect of flat can be considered a wart. There are three different desired behaviors depending on who you talk to. For efficiency reasons, some only want flat (and even ravel) to work if the array is already contiguous; that is, they don't want copies unless they ask for them.
This isn't just efficiency: having a function (or method) that sometimes returns a copy, and sometimes a reference is asking for bugs. What happens if I make a change to the result in a function? Sometimes it will change the parent array, sometimes not.
otherwise for it to return a view. Yet others always want a copy. So, are three different versions needed? Or options to a function? The drawback of .flat (as an attribute) is there is only one choice for behavior.
A agree. I vote for a method. By the way, is it really impossible to have a discontiguous 1-d array? I'm not wizard at C or C++ but I've worked with the Numeric api enough to see what the problem is. However, it seems that there should be way to have a "get the n-th element" function or method to the Numarray object that should then work on polymorphic types, one of which would be a rank-1 non-contiguous array. Perhaps there is way too much existing code that relies on the array->strides[n] approach to introduce this now, but I think this kind of thing would be the key to making it easier to write optimized Numarray functions.
I'd very much prefer not proliferate any more flavors of behavior and just settle on one.
+5 on this. I'd really like SciPy and numarray both to have the goal of merging the two. -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
On Mon, 2004-11-01 at 18:28, Chris Barker wrote:
Perry Greenfield wrote:
This aspect of flat can be considered a wart. There are three different desired behaviors depending on who you talk to. For efficiency reasons, some only want flat (and even ravel) to work if the array is already contiguous; that is, they don't want copies unless they ask for them.
This isn't just efficiency: having a function (or method) that sometimes returns a copy, and sometimes a reference is asking for bugs. What happens if I make a change to the result in a function? Sometimes it will change the parent array, sometimes not.
I looked at this some more and discovered we're doing what Numeric does with the .flat attribute: raise an exception for non-contiguous arrays. So backward compatibility is one motive for keeping .flat the way it is now.
otherwise for it to return a view. Yet others always want a copy. So, are three different versions needed? Or options to a function? The drawback of .flat (as an attribute) is there is only one choice for behavior.
A agree. I vote for a method.
By the way, is it really impossible to have a discontiguous 1-d array?
No. RecArray is based on this: elements in a column are typically spaced by more than the size of one element yet can appear as a single 1D array.
I'm not wizard at C or C++ but I've worked with the Numeric api enough to see what the problem is. However, it seems that there should be way to have a "get the n-th element" function or method to the Numarray object that should then work on polymorphic types, one of which would be a rank-1 non-contiguous array.
I think the real issue is that non-contiguous arrays cannot be reshaped to become rank-1 arrays just by munging the strides. Thus, it's hard/impossible to write a .flat which works without making a copy of the array when the original was non-contiguous.
Perhaps there is way too much existing code that relies on the array->strides[n] approach to introduce this now, but I think this kind of thing would be the key to making it easier to write optimized Numarray functions.
Well, there is support now for adding universal functions to numarray. You supply a C function or macro of 1 or 2 inputs and 1 output, and numarray's ufunc machinery applies the function element-wise. The machinery takes care of array shape and non-contiguousness as well as other things such as misalignment, byte swapping, and type coercion. This is demoed in Examples/ufunc. In addition, numarray-1.2 will support functions of M inputs and N outputs. So, with what we have now, you'd be able to write your own cos() ufunc and it would be as efficient as numarray's cos(). Regards, Todd
participants (5)
-
Chris Barker
-
Perry Greenfield
-
Peter Verveer
-
Tim Hochberg
-
Todd Miller