I have to admit that I agree with all of what Eric has to say here -- even if it does cause some code breakage (I'm certainly willing to do some maintenance on my code/modules that are floating here and there so long as things continue to improve with the language as a whole). I do think consistency is a very important aspect of getting Numeric/Numarray accepted by a larger user base (and believe me, my colaborators are probably sick of my Numeric Python evangelism (but I like to think also a bit jealous of my NumPy usage as they continue struggling with one-off C and Fortran routines...)). Another example of a glaring inconsistency in the current implementation is this little number that has been bugging me for awhile:
arange(10, typecode='d') array([ 0., 1., 2., 3., 4., 5., 6., 7., 8., 9.]) ones(10, typecode='d') array([ 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]) zeros(10, typecode='d') Traceback (most recent call last): File "<stdin>", line 1, in ? TypeError: an integer is required zeros(10, 'd') array([ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])
Anyway, these little warts that we are discussing probably haven't kept my astronomer friends from switching from IDL, but as things progress and well-known astronomical or other scientific software packages are released based on Python (like pyraf) from well-known groups (like STScI/NASA), they will certainly take a closer look. On a slightly different note, my hearty thanks to all the developers for all of your hard work so far. Numeric/Numarray+Python is a fantastic platform for scientific computation. Cheers, Scott On Mon, Jun 10, 2002 at 06:15:25PM -0500, eric jones wrote:
So one contentious issue a day isn't enough, huh? :-)
An issue that has been raised by scipy (most notably Eric Jones and Travis Oliphant) has been whether the default axis used by various functions should be changed from the current Numeric default. This message is not directed at determining whether we should change the current Numeric behavior for Numeric, but whether numarray should adopt the same behavior as the current Numeric.
To be more specific, certain functions and methods, such as add.reduce(), operate by default on the first axis. For example, if x is a 2 x 10 array, then add.reduce(x) results in a 10 element array, where elements in the first dimension has been summed over rather than the most rapidly varying dimension.
x = arange(20) x.shape = (2,10) x array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [[10, 11, 12, 13, 14, 15, 16, 17, 18, 19]]) add.reduce(x) array([10, 12, 14, 16, 18, 20, 22, 24, 26, 28])
The issue here is both consistency across a library and speed.
From the numpy.pdf, Numeric looks to have about 16 functions using axis=0 (or index=0 which should really be axis=0) and, counting FFT, about 10 functions using axis=-1. To this day, I can't remember which functions use which and have resorted to explicitly using axis=-1 in my code. Unfortunately, many of the Numeric functions that should still don't take axis as a keyword, so you and up just inserting -1 in the argument list (but this is a different issue -- it just needs to be fixed).
SciPy always uses axis=-1 for operations. There are 60+ functions with this convention. Choosing -1 offers the best cache use and therefore should be more efficient. Defaulting to the fastest behavior is convenient because new users don't need any special knowledge of Numeric's implementation to get near peak performance. Also, there is never a question about which axis is used for calculations.
When using SciPy and Numeric, their function sets are completely co-mingled. When adding SciPy and Numeric's function counts together, it is 70 to 16 for axis=-1 vs. axis=0. Even though SciPy chose a standard, it is impossible for the interface to become intuitive because of the exceptions to the rule from Numeric.
So here what I think. All functions should default to the same axis so that the interface to common functions can become second nature for new users and experts alike. Further, the chosen axis should be the most efficient for the most cases.
There are actually a few functions that, taken in isolation, I think should have axis=0. take() is an example. But, for the sake of consistency, it too should use axis=-1.
It has been suggested to recommend that new users always specify axis=? as a keyword in functions that require an axis argument. This might be fine when writing modules, but always having to type:
sum(a,axis=-1)
in command line mode is a real pain.
Just a point about the larger picture here... The changes we're discussing are intended to clean up the warts on Numeric -- and, as good as it is overall, these are warts in terms of usability. Interfaces should be consistent across a library. The return types from functions should be consistent regardless of input type (or shape). Default arguments to the same keyword should also be consistent across functions. Some issues are left to debate (i.e. using axis=-1 or axis=0 as default, returning arrays or scalars from Numeric functions and indexing), but the choice made should be applied as consistently as possible.
We should also strive to make it as easy as possible to write generic functions that work for all array types (Int, Float,Float32,Complex, etc.) -- yet another debate to come.
Changes are going to create some backward incompatibilities and that is definitely a bummer. But some changes are also necessary before the community gets big. I know the community is already reasonable size, but I also believe, based on the strength of Python, Numeric, and libraries such as Scientific and SciPy, the community can grow by 2 orders of magnitude over the next five years. This kind of growth can't occur if only savvy developers see the benefits of the elegant language. It can only occur if the general scientist see Python as a compelling alternative to Matlab (and IDL) as their day-in/day-out command line environment for scientific/engineering analysis. Making the interface consistent is one of several steps to making Python more attractive to this community.
Whether the changes made for numarray should be migrated back into Numeric is an open question. I think they should, but see Konrad's counterpoint. I'm willing for SciPy to be the intermediate step in the migration between the two, but also think that is sub-optimal.
Some feel that is contrary to expectations that the least rapidly varying dimension should be operated on by default. There are good arguments for both sides. For example, Konrad Hinsen has argued that the current behavior is most compatible for behavior of other Python sequences. For example,
sum = 0 for subarr in x: sum += subarr
acts on the first axis in effect. Likewise
reduce(add, x)
does likewise. In this sense, Numeric is currently more consistent with Python behavior. However, there are other functions that operate on the most rapidly varying dimension. Unfortunately I cannot currently access my old mail, but I think the rule that was proposed under this argument was that if the 'reduction' operation was of a structural kind, the first dimension is used. If the reduction or processing step is 'time-series' oriented (e.g., FFT, convolve) then the last dimension is the default. On the other hand, some feel it would be much simpler to understand if the last axis was the default always.
The question is whether there is a consensus for one approach or the other. We raised this issue at a scientific Birds-of-a-Feather session at the last Python Conference. The sense I got there was that most were for the status quo, keeping the behavior as it is now. Is the same true here? In the absence of consensus or a convincing majority, we will keep the behavior the same for backward compatibility purposes.
Obviously, I'm more opinionated about this now than I was then. I really urge you to consider using axis=-1 everywhere. SciPy is not the only scientific library, but I think it adds the most functions with a similar signature (the stats module is full of them). I very much hope for a consistent interface across all of Python's scientific functions because command line users aren't going to care whether sum() and kurtosis() come from different libraries, they just want them to behave consistently.
eric
Perry
_______________________________________________________________
Don't miss the 2002 Sprint PCS Application Developer's Conference August 25-28 in Las Vegas - http://devcon.sprintpcs.com/adp/index.cfm?source=osdntextlink
_______________________________________________ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
-- -- Scott M. Ransom Address: McGill Univ. Physics Dept. Phone: (514) 398-6492 3600 University St., Rm 338 email: ransom@physics.mcgill.ca Montreal, QC Canada H3A 2T8 GPG Fingerprint: 06A9 9553 78BE 16DB 407B FFCA 9BFA B6FF FFD3 2989