An issue that has been raised by scipy (most notably Eric Jones and Travis Oliphant) has been whether the default axis used by various functions should be changed from the current Numeric default. This message is not directed at determining whether we should change the current Numeric behavior for Numeric, but whether numarray should adopt the same behavior as the current Numeric. To be more specific, certain functions and methods, such as add.reduce(), operate by default on the first axis. For example, if x is a 2 x 10 array, then add.reduce(x) results in a 10 element array, where elements in the first dimension has been summed over rather than the most rapidly varying dimension.
x = arange(20) x.shape = (2,10) x array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [[10, 11, 12, 13, 14, 15, 16, 17, 18, 19]]) add.reduce(x) array([10, 12, 14, 16, 18, 20, 22, 24, 26, 28])
Some feel that is contrary to expectations that the least rapidly varying dimension should be operated on by default. There are good arguments for both sides. For example, Konrad Hinsen has argued that the current behavior is most compatible for behavior of other Python sequences. For example,
sum = 0 for subarr in x: sum += subarr
acts on the first axis in effect. Likewise
reduce(add, x)
does likewise. In this sense, Numeric is currently more consistent with Python behavior. However, there are other functions that operate on the most rapidly varying dimension. Unfortunately I cannot currently access my old mail, but I think the rule that was proposed under this argument was that if the 'reduction' operation was of a structural kind, the first dimension is used. If the reduction or processing step is 'time-series' oriented (e.g., FFT, convolve) then the last dimension is the default. On the other hand, some feel it would be much simpler to understand if the last axis was the default always. The question is whether there is a consensus for one approach or the other. We raised this issue at a scientific Birds-of-a-Feather session at the last Python Conference. The sense I got there was that most were for the status quo, keeping the behavior as it is now. Is the same true here? In the absence of consensus or a convincing majority, we will keep the behavior the same for backward compatibility purposes. Perry
So one contentious issue a day isn't enough, huh? :-)
An issue that has been raised by scipy (most notably Eric Jones and Travis Oliphant) has been whether the default axis used by various functions should be changed from the current Numeric default. This message is not directed at determining whether we should change the current Numeric behavior for Numeric, but whether numarray should adopt the same behavior as the current Numeric.
To be more specific, certain functions and methods, such as add.reduce(), operate by default on the first axis. For example, if x is a 2 x 10 array, then add.reduce(x) results in a 10 element array, where elements in the first dimension has been summed over rather than the most rapidly varying dimension.
x = arange(20) x.shape = (2,10) x array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [[10, 11, 12, 13, 14, 15, 16, 17, 18, 19]]) add.reduce(x) array([10, 12, 14, 16, 18, 20, 22, 24, 26, 28])
The issue here is both consistency across a library and speed.
From the numpy.pdf, Numeric looks to have about 16 functions using axis=0 (or index=0 which should really be axis=0) and, counting FFT, about 10 functions using axis=-1. To this day, I can't remember which functions use which and have resorted to explicitly using axis=-1 in my code. Unfortunately, many of the Numeric functions that should still don't take axis as a keyword, so you and up just inserting -1 in the argument list (but this is a different issue -- it just needs to be fixed).
SciPy always uses axis=-1 for operations. There are 60+ functions with this convention. Choosing -1 offers the best cache use and therefore should be more efficient. Defaulting to the fastest behavior is convenient because new users don't need any special knowledge of Numeric's implementation to get near peak performance. Also, there is never a question about which axis is used for calculations. When using SciPy and Numeric, their function sets are completely co-mingled. When adding SciPy and Numeric's function counts together, it is 70 to 16 for axis=-1 vs. axis=0. Even though SciPy chose a standard, it is impossible for the interface to become intuitive because of the exceptions to the rule from Numeric. So here what I think. All functions should default to the same axis so that the interface to common functions can become second nature for new users and experts alike. Further, the chosen axis should be the most efficient for the most cases. There are actually a few functions that, taken in isolation, I think should have axis=0. take() is an example. But, for the sake of consistency, it too should use axis=-1. It has been suggested to recommend that new users always specify axis=? as a keyword in functions that require an axis argument. This might be fine when writing modules, but always having to type: >>> sum(a,axis=-1) in command line mode is a real pain. Just a point about the larger picture here... The changes we're discussing are intended to clean up the warts on Numeric -- and, as good as it is overall, these are warts in terms of usability. Interfaces should be consistent across a library. The return types from functions should be consistent regardless of input type (or shape). Default arguments to the same keyword should also be consistent across functions. Some issues are left to debate (i.e. using axis=-1 or axis=0 as default, returning arrays or scalars from Numeric functions and indexing), but the choice made should be applied as consistently as possible. We should also strive to make it as easy as possible to write generic functions that work for all array types (Int, Float,Float32,Complex, etc.) -- yet another debate to come. Changes are going to create some backward incompatibilities and that is definitely a bummer. But some changes are also necessary before the community gets big. I know the community is already reasonable size, but I also believe, based on the strength of Python, Numeric, and libraries such as Scientific and SciPy, the community can grow by 2 orders of magnitude over the next five years. This kind of growth can't occur if only savvy developers see the benefits of the elegant language. It can only occur if the general scientist see Python as a compelling alternative to Matlab (and IDL) as their day-in/day-out command line environment for scientific/engineering analysis. Making the interface consistent is one of several steps to making Python more attractive to this community. Whether the changes made for numarray should be migrated back into Numeric is an open question. I think they should, but see Konrad's counterpoint. I'm willing for SciPy to be the intermediate step in the migration between the two, but also think that is sub-optimal.
Some feel that is contrary to expectations that the least rapidly varying dimension should be operated on by default. There are good arguments for both sides. For example, Konrad Hinsen has argued that the current behavior is most compatible for behavior of other Python sequences. For example,
sum = 0 for subarr in x: sum += subarr
acts on the first axis in effect. Likewise
reduce(add, x)
does likewise. In this sense, Numeric is currently more consistent with Python behavior. However, there are other functions that operate on the most rapidly varying dimension. Unfortunately I cannot currently access my old mail, but I think the rule that was proposed under this argument was that if the 'reduction' operation was of a structural kind, the first dimension is used. If the reduction or processing step is 'time-series' oriented (e.g., FFT, convolve) then the last dimension is the default. On the other hand, some feel it would be much simpler to understand if the last axis was the default always.
The question is whether there is a consensus for one approach or the other. We raised this issue at a scientific Birds-of-a-Feather session at the last Python Conference. The sense I got there was that most were for the status quo, keeping the behavior as it is now. Is the same true here? In the absence of consensus or a convincing majority, we will keep the behavior the same for backward compatibility purposes.
Obviously, I'm more opinionated about this now than I was then. I really urge you to consider using axis=-1 everywhere. SciPy is not the only scientific library, but I think it adds the most functions with a similar signature (the stats module is full of them). I very much hope for a consistent interface across all of Python's scientific functions because command line users aren't going to care whether sum() and kurtosis() come from different libraries, they just want them to behave consistently. eric
Perry
I have to admit that I agree with all of what Eric has to say here -- even if it does cause some code breakage (I'm certainly willing to do some maintenance on my code/modules that are floating here and there so long as things continue to improve with the language as a whole). I do think consistency is a very important aspect of getting Numeric/Numarray accepted by a larger user base (and believe me, my colaborators are probably sick of my Numeric Python evangelism (but I like to think also a bit jealous of my NumPy usage as they continue struggling with one-off C and Fortran routines...)). Another example of a glaring inconsistency in the current implementation is this little number that has been bugging me for awhile:
arange(10, typecode='d') array([ 0., 1., 2., 3., 4., 5., 6., 7., 8., 9.]) ones(10, typecode='d') array([ 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]) zeros(10, typecode='d') Traceback (most recent call last): File "<stdin>", line 1, in ? TypeError: an integer is required zeros(10, 'd') array([ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])
Anyway, these little warts that we are discussing probably haven't kept my astronomer friends from switching from IDL, but as things progress and well-known astronomical or other scientific software packages are released based on Python (like pyraf) from well-known groups (like STScI/NASA), they will certainly take a closer look. On a slightly different note, my hearty thanks to all the developers for all of your hard work so far. Numeric/Numarray+Python is a fantastic platform for scientific computation. Cheers, Scott On Mon, Jun 10, 2002 at 06:15:25PM -0500, eric jones wrote:
So one contentious issue a day isn't enough, huh? :-)
An issue that has been raised by scipy (most notably Eric Jones and Travis Oliphant) has been whether the default axis used by various functions should be changed from the current Numeric default. This message is not directed at determining whether we should change the current Numeric behavior for Numeric, but whether numarray should adopt the same behavior as the current Numeric.
To be more specific, certain functions and methods, such as add.reduce(), operate by default on the first axis. For example, if x is a 2 x 10 array, then add.reduce(x) results in a 10 element array, where elements in the first dimension has been summed over rather than the most rapidly varying dimension.
x = arange(20) x.shape = (2,10) x array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [[10, 11, 12, 13, 14, 15, 16, 17, 18, 19]]) add.reduce(x) array([10, 12, 14, 16, 18, 20, 22, 24, 26, 28])
The issue here is both consistency across a library and speed.
From the numpy.pdf, Numeric looks to have about 16 functions using axis=0 (or index=0 which should really be axis=0) and, counting FFT, about 10 functions using axis=-1. To this day, I can't remember which functions use which and have resorted to explicitly using axis=-1 in my code. Unfortunately, many of the Numeric functions that should still don't take axis as a keyword, so you and up just inserting -1 in the argument list (but this is a different issue -- it just needs to be fixed).
SciPy always uses axis=-1 for operations. There are 60+ functions with this convention. Choosing -1 offers the best cache use and therefore should be more efficient. Defaulting to the fastest behavior is convenient because new users don't need any special knowledge of Numeric's implementation to get near peak performance. Also, there is never a question about which axis is used for calculations.
When using SciPy and Numeric, their function sets are completely co-mingled. When adding SciPy and Numeric's function counts together, it is 70 to 16 for axis=-1 vs. axis=0. Even though SciPy chose a standard, it is impossible for the interface to become intuitive because of the exceptions to the rule from Numeric.
So here what I think. All functions should default to the same axis so that the interface to common functions can become second nature for new users and experts alike. Further, the chosen axis should be the most efficient for the most cases.
There are actually a few functions that, taken in isolation, I think should have axis=0. take() is an example. But, for the sake of consistency, it too should use axis=-1.
It has been suggested to recommend that new users always specify axis=? as a keyword in functions that require an axis argument. This might be fine when writing modules, but always having to type:
sum(a,axis=-1)
in command line mode is a real pain.
Just a point about the larger picture here... The changes we're discussing are intended to clean up the warts on Numeric -- and, as good as it is overall, these are warts in terms of usability. Interfaces should be consistent across a library. The return types from functions should be consistent regardless of input type (or shape). Default arguments to the same keyword should also be consistent across functions. Some issues are left to debate (i.e. using axis=-1 or axis=0 as default, returning arrays or scalars from Numeric functions and indexing), but the choice made should be applied as consistently as possible.
We should also strive to make it as easy as possible to write generic functions that work for all array types (Int, Float,Float32,Complex, etc.) -- yet another debate to come.
Changes are going to create some backward incompatibilities and that is definitely a bummer. But some changes are also necessary before the community gets big. I know the community is already reasonable size, but I also believe, based on the strength of Python, Numeric, and libraries such as Scientific and SciPy, the community can grow by 2 orders of magnitude over the next five years. This kind of growth can't occur if only savvy developers see the benefits of the elegant language. It can only occur if the general scientist see Python as a compelling alternative to Matlab (and IDL) as their day-in/day-out command line environment for scientific/engineering analysis. Making the interface consistent is one of several steps to making Python more attractive to this community.
Whether the changes made for numarray should be migrated back into Numeric is an open question. I think they should, but see Konrad's counterpoint. I'm willing for SciPy to be the intermediate step in the migration between the two, but also think that is sub-optimal.
Some feel that is contrary to expectations that the least rapidly varying dimension should be operated on by default. There are good arguments for both sides. For example, Konrad Hinsen has argued that the current behavior is most compatible for behavior of other Python sequences. For example,
sum = 0 for subarr in x: sum += subarr
acts on the first axis in effect. Likewise
reduce(add, x)
does likewise. In this sense, Numeric is currently more consistent with Python behavior. However, there are other functions that operate on the most rapidly varying dimension. Unfortunately I cannot currently access my old mail, but I think the rule that was proposed under this argument was that if the 'reduction' operation was of a structural kind, the first dimension is used. If the reduction or processing step is 'time-series' oriented (e.g., FFT, convolve) then the last dimension is the default. On the other hand, some feel it would be much simpler to understand if the last axis was the default always.
The question is whether there is a consensus for one approach or the other. We raised this issue at a scientific Birds-of-a-Feather session at the last Python Conference. The sense I got there was that most were for the status quo, keeping the behavior as it is now. Is the same true here? In the absence of consensus or a convincing majority, we will keep the behavior the same for backward compatibility purposes.
Obviously, I'm more opinionated about this now than I was then. I really urge you to consider using axis=-1 everywhere. SciPy is not the only scientific library, but I think it adds the most functions with a similar signature (the stats module is full of them). I very much hope for a consistent interface across all of Python's scientific functions because command line users aren't going to care whether sum() and kurtosis() come from different libraries, they just want them to behave consistently.
eric
Perry
_______________________________________________________________
Don't miss the 2002 Sprint PCS Application Developer's Conference August 25-28 in Las Vegas - http://devcon.sprintpcs.com/adp/index.cfm?source=osdntextlink
_______________________________________________ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
-- -- Scott M. Ransom Address: McGill Univ. Physics Dept. Phone: (514) 398-6492 3600 University St., Rm 338 email: ransom@physics.mcgill.ca Montreal, QC Canada H3A 2T8 GPG Fingerprint: 06A9 9553 78BE 16DB 407B FFCA 9BFA B6FF FFD3 2989
On Mon, 2002-06-10 at 19:55, Scott Ransom wrote:
I have to admit that I agree with all of what Eric has to say here -- even if it does cause some code breakage (I'm certainly willing to do some maintenance on my code/modules that are floating here and there so long as things continue to improve with the language as a whole).
I'm generally of the same opinion.
I do think consistency is a very important aspect of getting Numeric/Numarray accepted by a larger user base (and believe me, my colaborators are probably sick of my Numeric Python evangelism (but I like to think also a bit jealous of my NumPy usage as they continue struggling with one-off C and Fortran routines...)).
Another important factor is the support libraries. I know that something like Simulink (Matlab) is important to many of my colleagues in engineering. Simulink is the Mathworks version of visual programming which lets the user create a circuit visually which is then processed. I believe there was a good start to this sort of thing presented at the last Python Conference which was very encouraging. Other colleagues require something like a compiler to get C-code which will compile on a DSP board from a script and/or design session. I believe something like this would be very beneficial.
Another example of a glaring inconsistency in the current implementation is this little number that has been bugging me for awhile:
arange(10, typecode='d') array([ 0., 1., 2., 3., 4., 5., 6., 7., 8., 9.]) ones(10, typecode='d') array([ 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]) zeros(10, typecode='d') Traceback (most recent call last): File "<stdin>", line 1, in ? TypeError: an integer is required zeros(10, 'd') array([ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])
This is now fixed in cvs, along with other keyword problems. The ufunc methods reduce and accumulate also now take a keyword argument in CVS. -Travis
I guess the argument for uniformity is pretty persuasive after all. (I know, I don't fit in on the Net, you can change my mind). Actually, don't we have a quick and dirty out here? Suppose we make the more uniform choice for Numarray, and then make a new module, say NumericCompatibility, which defines aliases to everything in Numarray that is the same as Numeric and then for the rest defines functions with the same names but the Numeric defaults, implemented by calling the ones in Numarray. Then changing "import Numeric" to "import NumericCompatibility as Numeric" ought to be enough to get someone working or close to working again. Someone posted something about "retrofitting" stuff from Numarray to Numeric. I cannot say strongly enough that I oppose this. Numeric itself must be frozen asap and eliminated eventually or there is no point to having developed a replacement that is easier to expand and maintain. We would have just doubled our workload for nothing.
"eric jones"
The issue here is both consistency across a library and speed.
Consistency, fine. But not just within one package, also between that package and the language it is implemented in. Speed, no. If I need a sum along the first axis, I won't replace it by a sum across the last axis just because that is faster.
From the numpy.pdf, Numeric looks to have about 16 functions using axis=0 (or index=0 which should really be axis=0) and, counting FFT, about 10 functions using axis=-1. To this day, I can't remember which
If you weight by frequency of usage, the first group gains a lot in importance. I just scanned through some of my code; almost all of the calls to Numeric routines are to functions whose default axis is zero.
code. Unfortunately, many of the Numeric functions that should still don't take axis as a keyword, so you and up just inserting -1 in the
That is certainly something that should be fixed, and I suppose no one objects to that. My vote is for keeping axis defaults as they are, both because the choices are reasonable (there was a long discussion about them in the early days of NumPy, and the defaults were chosen based on other array languages that had already been in use for years) and because any change would cause most existing NumPy code to break in many places, often giving wrong results instead of an error message. If a uniformization of the default is desired, I vote for axis=0, for two reasons: 1) Consistency with Python usage. 2) Minimization of code breakage.
We should also strive to make it as easy as possible to write generic functions that work for all array types (Int, Float,Float32,Complex, etc.) -- yet another debate to come.
What needs to be improved in that area?
Changes are going to create some backward incompatibilities and that is definitely a bummer. But some changes are also necessary before the community gets big. I know the community is already reasonable size,
I'd like to see evidence that changing the current NumPy behaviour would increase the size of the community. It would first of all split the current community, because many users (like myself) do not have enough time to spare to go through their code line by line in order to check for incompatibilities. That many others would switch to Python if only some changes were made is merely an hypothesis.
Some feel that is contrary to expectations that the least rapidly varying dimension should be operated on by default. There are good arguments for both sides. For example, Konrad Hinsen has
Actually the argument is not for the least rapidly varying dimension, but for the first dimension. The internal data layout is not significant for most Python array operations. We might for example offer a choice of C style and Fortran style data layout, enabling users to choose according to speed, compatibility, or just personal preference. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen@cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais -------------------------------------------------------------------------------
Konrad's arguments are also very good. I guess there was a good reason we did all that arguing before -- another issue where there is a Perl-like "more than one way to do it" quandry. I think in my own coding reduction on the first dimension is the most frequent.
-----Original Message----- From: numpy-discussion-admin@lists.sourceforge.net [mailto:numpy-discussion-admin@lists.sourceforge.net] On Behalf Of Konrad Hinsen Sent: Tuesday, June 11, 2002 6:12 AM To: eric jones Cc: 'Perry Greenfield'; numpy-discussion@lists.sourceforge.net Subject: Re: [Numpy-discussion] RE: default axis for numarray
"eric jones"
writes: The issue here is both consistency across a library and speed.
Consistency, fine. But not just within one package, also between that package and the language it is implemented in.
Speed, no. If I need a sum along the first axis, I won't replace it by a sum across the last axis just because that is faster.
From the numpy.pdf, Numeric looks to have about 16 functions using axis=0 (or index=0 which should really be axis=0) and, counting FFT, about 10 functions using axis=-1. To this day, I can't remember which
If you weight by frequency of usage, the first group gains a lot in importance. I just scanned through some of my code; almost all of the calls to Numeric routines are to functions whose default axis is zero.
code. Unfortunately, many of the Numeric functions that should still don't take axis as a keyword, so you and up just inserting -1 in the
That is certainly something that should be fixed, and I suppose no one objects to that.
My vote is for keeping axis defaults as they are, both because the choices are reasonable (there was a long discussion about them in the early days of NumPy, and the defaults were chosen based on other array languages that had already been in use for years) and because any change would cause most existing NumPy code to break in many places, often giving wrong results instead of an error message.
If a uniformization of the default is desired, I vote for axis=0, for two reasons: 1) Consistency with Python usage. 2) Minimization of code breakage.
We should also strive to make it as easy as possible to write generic functions that work for all array types (Int, Float,Float32,Complex, etc.) -- yet another debate to come.
What needs to be improved in that area?
Changes are going to create some backward incompatibilities and that is definitely a bummer. But some changes are also necessary before the community gets big. I know the community is already reasonable size,
I'd like to see evidence that changing the current NumPy behaviour would increase the size of the community. It would first of all split the current community, because many users (like myself) do not have enough time to spare to go through their code line by line in order to check for incompatibilities. That many others would switch to Python if only some changes were made is merely an hypothesis.
Some feel that is contrary to expectations that the least rapidly varying dimension should be operated on by default. There are good arguments for both sides. For example, Konrad Hinsen has
Actually the argument is not for the least rapidly varying dimension, but for the first dimension. The internal data layout is not significant for most Python array operations. We might for example offer a choice of C style and Fortran style data layout, enabling users to choose according to speed, compatibility, or just personal preference.
Konrad. -- -------------------------------------------------------------- ----------------- Konrad Hinsen | E-Mail: hinsen@cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais -------------------------------------------------------------- -----------------
_______________________________________________________________
Don't miss the 2002 Sprint PCS Application Developer's Conference August 25-28 in Las Vegas - http://devcon.sprintpcs.com/adp/index.cfm?> source=osdntextlink
_______________________________________________ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
"eric jones"
writes: The issue here is both consistency across a library and speed.
Consistency, fine. But not just within one package, also between that package and the language it is implemented in.
Speed, no. If I need a sum along the first axis, I won't replace it by a sum across the last axis just because that is faster.
The default axis choice influences how people choose to lay out their data in arrays. If the default is to sum down columns, then users lay out their data so that this is the order of computation. This results in strided operations. There are cases where you need to reduce over multiple data sets, etc. which is what the axis=? flag is for. But choosing the default to also be the most efficient just makes sense. The cost is even higher for wrappers around C libraries not written explicitly for Python (which is most of them), because you have to re-order the memory before passing the variables into the C loop. Of course, the axis=0 is faster for Fortran libraries with wrappers that are smart enough to recognize this (Pearu's f2py wrapped libraries now recognize this sort of thing). However, the marriage to C is more important as future growth will come in this area more than Fortran.
From the numpy.pdf, Numeric looks to have about 16 functions using axis=0 (or index=0 which should really be axis=0) and, counting FFT, about 10 functions using axis=-1. To this day, I can't remember
which
If you weight by frequency of usage, the first group gains a lot in importance. I just scanned through some of my code; almost all of the calls to Numeric routines are to functions whose default axis is zero.
Right, but I think all the reduce operators (sum, product, etc.) should have been axis=-1 in the first place.
code. Unfortunately, many of the Numeric functions that should
still
don't take axis as a keyword, so you and up just inserting -1 in the
That is certainly something that should be fixed, and I suppose no one objects to that.
Sounds like Travis already did it. Thanks.
My vote is for keeping axis defaults as they are, both because the choices are reasonable (there was a long discussion about them in the early days of NumPy, and the defaults were chosen based on other array languages that had already been in use for years) and because any change would cause most existing NumPy code to break in many places, often giving wrong results instead of an error message.
If a uniformization of the default is desired, I vote for axis=0, for two reasons: 1) Consistency with Python usage.
I think the consistency with Python is less of an issue than it seems. I wasn't aware that add.reduce(x) would generated the same results as the Python version of reduce(add,x) until Perry pointed it out to me. There are some inconsistencies between Python the language and Numeric because the needs of the Numeric community. For instance, slices create views instead of copies as in Python. This was a correct break with consistency in a very utilized area of Python because of efficiency. I don't see choosing axis=-1 as a break with Python -- multi-dimensional arrays are inherently different and used differently than lists of lists in Python. Further, reduce() is a "corner" of the Python language that has been superceded by list comprehensions. Choosing an alternative behavior that is generally better for array operations, as in the case of slices as views, is worth the change.
2) Minimization of code breakage.
Fixes will be necessary for sure, and I wish that wasn't the case. They will be necessary if we choose a consistent interface in either case. Choosing axis=0 or axis=-1 will not change what needs to be fixed -- only the function names searched for.
We should also strive to make it as easy as possible to write
generic
functions that work for all array types (Int, Float,Float32,Complex, etc.) -- yet another debate to come.
What needs to be improved in that area?
Comparisons of complex numbers. But lets save that debate for later.
Changes are going to create some backward incompatibilities and that
is
definitely a bummer. But some changes are also necessary before the community gets big. I know the community is already reasonable size,
I'd like to see evidence that changing the current NumPy behaviour would increase the size of the community. It would first of all split the current community, because many users (like myself) do not have enough time to spare to go through their code line by line in order to check for incompatibilities. That many others would switch to Python if only some changes were made is merely an hypothesis.
True. But I can tell you that we're definitely doing something wrong now. We have a superior language that is easier to integrate with legacy code and less expensive than the best competing alternatives. And, though I haven't done a serious market survey, I feel safe in saying we have significantly less than 1% of the potential user base. Even in communities where Python is relatively prevalent like astronomy, I would bet the every-day user base is less than 5% of the whole. There are a lot of holes to fill (graphics, comprehensive libraries, etc.) before we get up to the capabilities and quality of user interface that these tools have. Some of the interfaces problems are GUI and debugger related. Others are API related. Inconsistency in a library interface makes it harder to learn and is a wart. Whether it is as important as a graphics library? Probably not. But while we're building the next generation tool, we should fix things that make people wonder "why did they do this?". It is rarely a single thing that makes all the difference to a prospective user switching over. It is the overall quality of the tool that will sway them.
Some feel that is contrary to expectations that the least rapidly varying dimension should be operated on by default. There are good arguments for both sides. For example, Konrad Hinsen has
Actually the argument is not for the least rapidly varying dimension, but for the first dimension. The internal data layout is not significant for most Python array operations. We might for example offer a choice of C style and Fortran style data layout, enabling users to choose according to speed, compatibility, or just personal preference.
In a way, as Pearu has shown in f2py, this is already possible by jiggering the stride and dimension entries, so this doesn't even require a change to the array descriptor (I don't think...). We could supply functions that returned a Fortran layout array. This would be beneficial for some applications outside of what we're discussing now that use Fortran extensions heavily. As long as it is transparent to the extension writer (which I think it can be) it sounds fine. I think the default constructor should return a C layout array though, and will be what 99% of the users will use. eric
Konrad. --
------------------------------------------------------------------------ --
----- Konrad Hinsen | E-Mail: hinsen@cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais
------------------------------------------------------------------------ --
-----
<Eric Jones writes>: <Konrad Hinsen writes>:
What needs to be improved in that area?
Comparisons of complex numbers. But lets save that debate for later.
No, no, let's do it now. ;-) We for one would like to know for numarray what should be done. If I might be presumptious enough to anticipate what Eric would say, it is that complex comparisons should be allowed, and that they use all the information in the complex number (real and imaginary) so that they lead to consistent results in sorting. But the purist argues that comparisons for complex numbers are meaningless. Well, yes, but there are cases in code where you don't which such comparisons to cause an exception. But even more important, there is at least one case which is practical. It isn't all that uncommon to want to eliminate duplicate values from arrays, and one would like to be able to do that for complex values as well. A common technique is to sort the values and then eliminate all identical adjacent values. A predictable comparison rule would allow that to be easily implemented. Eric, am I missing anything in this? It should be obvious that we agree with his position, but I am wondering if there are any arguments we have not heard yet that outweigh the advantages we see. Perry
One can make a case for allowing == and != for complex arrays, but > just doesn't make sense and should not be allowed.
-----Original Message----- From: numpy-discussion-admin@lists.sourceforge.net [mailto:numpy-discussion-admin@lists.sourceforge.net] On Behalf Of Perry Greenfield Sent: Tuesday, June 11, 2002 11:52 AM To: eric jones; 'Konrad Hinsen' Cc: numpy-discussion@lists.sourceforge.net Subject: RE: [Numpy-discussion] RE: default axis for numarray
<Eric Jones writes>: <Konrad Hinsen writes>:
What needs to be improved in that area?
Comparisons of complex numbers. But lets save that debate for later.
No, no, let's do it now. ;-) We for one would like to know for numarray what should be done.
If I might be presumptious enough to anticipate what Eric would say, it is that complex comparisons should be allowed, and that they use all the information in the complex number (real and imaginary) so that they lead to consistent results in sorting.
But the purist argues that comparisons for complex numbers are meaningless. Well, yes, but there are cases in code where you don't which such comparisons to cause an exception. But even more important, there is at least one case which is practical. It isn't all that uncommon to want to eliminate duplicate values from arrays, and one would like to be able to do that for complex values as well. A common technique is to sort the values and then eliminate all identical adjacent values. A predictable comparison rule would allow that to be easily implemented.
Eric, am I missing anything in this? It should be obvious that we agree with his position, but I am wondering if there are any arguments we have not heard yet that outweigh the advantages we see.
Perry
_______________________________________________________________
Multimillion Dollar Computer Inventory Live Webcast Auctions Thru Aug. 2002 - http://www.cowanalexander.com/calendar
_______________________________________________ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
On June 11, 2002 04:56 pm, you wrote:
One can make a case for allowing == and != for complex arrays, but > just doesn't make sense and should not be allowed.
It depends if you think of complex numbers in phasor form or not. In phasor form, the amplitude of the complex number is certainly something that you could compare with > or < -- and in my opinion, that seems like a reasonable comparison. You _could_ do the same thing with the phases, except you run into the modulo 2pi thing... Scott
-----Original Message----- From: numpy-discussion-admin@lists.sourceforge.net [mailto:numpy-discussion-admin@lists.sourceforge.net] On Behalf Of Perry Greenfield Sent: Tuesday, June 11, 2002 11:52 AM To: eric jones; 'Konrad Hinsen' Cc: numpy-discussion@lists.sourceforge.net Subject: RE: [Numpy-discussion] RE: default axis for numarray
<Eric Jones writes>:
<Konrad Hinsen writes>:
What needs to be improved in that area?
Comparisons of complex numbers. But lets save that debate
for later.
No, no, let's do it now. ;-) We for one would like to know for numarray what should be done.
If I might be presumptious enough to anticipate what Eric would say, it is that complex comparisons should be allowed, and that they use all the information in the complex number (real and imaginary) so that they lead to consistent results in sorting.
But the purist argues that comparisons for complex numbers are meaningless. Well, yes, but there are cases in code where you don't which such comparisons to cause an exception. But even more important, there is at least one case which is practical. It isn't all that uncommon to want to eliminate duplicate values from arrays, and one would like to be able to do that for complex values as well. A common technique is to sort the values and then eliminate all identical adjacent values. A predictable comparison rule would allow that to be easily implemented.
Eric, am I missing anything in this? It should be obvious that we agree with his position, but I am wondering if there are any arguments we have not heard yet that outweigh the advantages we see.
Perry
_______________________________________________________________
Multimillion Dollar Computer Inventory Live Webcast Auctions Thru Aug. 2002 - http://www.cowanalexander.com/calendar
_______________________________________________ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
_______________________________________________________________
Multimillion Dollar Computer Inventory Live Webcast Auctions Thru Aug. 2002 - http://www.cowanalexander.com/calendar
_______________________________________________ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
-- Scott M. Ransom Address: McGill Univ. Physics Dept. Phone: (514) 398-6492 3600 University St., Rm 338 email: ransom@physics.mcgill.ca Montreal, QC Canada H3A 2T8 GPG Fingerprint: 06A9 9553 78BE 16DB 407B FFCA 9BFA B6FF FFD3 2989
Scott Ransom
On June 11, 2002 04:56 pm, you wrote:
One can make a case for allowing == and != for complex arrays, but > just doesn't make sense and should not be allowed.
It depends if you think of complex numbers in phasor form or not. In phasor form, the amplitude of the complex number is certainly something that you could compare with > or < -- and in my opinion, that seems like a reasonable
Sure, but that doesn't give a full order relation for complex numbers. Two different numbers with equal magnitude would be neither equal nor would one be larger than the other. I agree with Paul that complex comparison should not be allowed. On the other hand, Perry's argument about sorting makes sense as well. Is there anything that prevents us from permitting arraysort() on complex arrays but not the comparison operators? Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen@cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais -------------------------------------------------------------------------------
On Wed, Jun 12, 2002 at 10:32:12AM +0200, Konrad Hinsen wrote:
Scott Ransom
writes: On June 11, 2002 04:56 pm, you wrote:
One can make a case for allowing == and != for complex arrays, but > just doesn't make sense and should not be allowed.
It depends if you think of complex numbers in phasor form or not. In phasor form, the amplitude of the complex number is certainly something that you could compare with > or < -- and in my opinion, that seems like a reasonable
Sure, but that doesn't give a full order relation for complex numbers. Two different numbers with equal magnitude would be neither equal nor would one be larger than the other.
The comparison operators could be defined to operate on the magnitudes only. In this case you would get the kind of ugly result that two complex numbers with the same magnitude but different phases would be equal. Complex comparisons of this type could be quite useful to those (like me) who are do lots of Fourier domain signal processing.
I agree with Paul that complex comparison should not be allowed. On the other hand, Perry's argument about sorting makes sense as well. Is there anything that prevents us from permitting arraysort() on complex arrays but not the comparison operators?
How do you sort an array of complex numbers if you can't compare them? Scott -- Scott M. Ransom Address: McGill Univ. Physics Dept. Phone: (514) 398-6492 3600 University St., Rm 338 email: ransom@physics.mcgill.ca Montreal, QC Canada H3A 2T8 GPG Fingerprint: 06A9 9553 78BE 16DB 407B FFCA 9BFA B6FF FFD3 2989
The comparison operators could be defined to operate on the magnitudes only. In this case you would get the kind of ugly result that two complex numbers with the same magnitude but different phases would be equal.
If you want to compare magnitudes, you can do that explicitly without much effort.
How do you sort an array of complex numbers if you can't compare them?
You could for example sort by real part first and by imaginary part second. That would be a well-defined sort order, but not a useful definition of comparison in the mathematical sense. Konrad -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen@cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais -------------------------------------------------------------------------------
On Wed, 12 Jun 2002, Konrad Hinsen wrote:
How do you sort an array of complex numbers if you can't compare them?
You could for example sort by real part first and by imaginary part second. That would be a well-defined sort order, but not a useful definition of comparison in the mathematical sense.
Releated discussion has been also in the scipy list. See the thread starting in http://www.scipy.org/site_content/mailman?fn=scipy-dev/2002-February/000364.... But here I would like to draw your attention to the suggestion that sort() function could take an optional argument that specifies the comparison method for complex numbers (for real numbers they are all equivalent). Here follows the releavant fragment of the message: http://www.scipy.org/site_content/mailman?fn=scipy-dev/2002-February/000366.... ... However, in different applications different conventions may be useful or reasonable for ordering complex numbers. Whatever is the convention, their mathematical correctness is irrelevant and this cannot be used as an argument for prefering one convention to another. I would propose providing number of efficient comparison methods for complex (or any) numbers that users may use in sort functions as an optional argument. For example, scipy.sort([2,1+2j],cmpmth='abs') -> [1+2j,2] # sorts by abs value scipy.sort([2,1+2j],cmpmth='real') -> [2,1+2j] # sorts by real part scipy.sort([2,1+2j],cmpmth='realimag') # sorts by real then by imag scipy.sort([2,1+2j],cmpmth='imagreal') # sorts by imag then by real scipy.sort([2,1+2j],cmpmth='absangle') # sorts by abs then by angle etc. scipy.sort([2,1+2j],cmpfunc=<user defined comparison function>) Note that scipy.sort([-1,1],cmpmth='absangle') -> [1,-1] which also demonstrates the arbitrariness of sorting complex numbers. ... Regards, Pearu
Using the term "comparison operators" is too loose and is causing a communication problem here. There are these comparison operators == and != (group 1) <, >, <=, and >= (group 2) For complex numbers it is easy to define the operators in group 1: x == y iff x.real == y.real and x.imag == y.imag. And, x != y iff (not x == y). I hardly think any other definition would be conceivable. The utility of this definition is questionable, as in most instances one should be making these comparisons with a tolerance, but there at least are cases when it makes sense. For group 2, there are a variety of possible definitions. Just to name three possible > definitions, the greater magnitude, the greater phase mod 2pi, or a radix-type order e.g., x > y if x.real > y.real or (x.real == y.real and x.imag > y.imag). A person can always define a function my_greater_than (c1, c2) to embody one of these definitions, and use it as an argument to a sort routine that takes a function argument to tell it how to sort. What you are arguing about is whether some particular version of this comparison should be "blessed" by attaching it to the operator ">". I do not think one of the definitions is such a clear winner that it should be blessed -- it would mean a casual reader could not guess what the operator means, and ">" does not have a doc string. Therefore I oppose doing so.
I'd be interested to know what IDL does? Does it compare complex numbers. Matlab allows comparisons of complex numbers but just compares the real part. I think this is reasonable. Often during a calculation of limited precision one ends up with a complex number when the result is in a "mathematically pure sense" real. I guess I trust the user to realize that if they are comparing numbers they know what they mean --- (only real numbers are compared so the complex part is ignored). -Travis
On 12 Jun 2002, Travis Oliphant wrote:
I'd be interested to know what IDL does? Does it compare complex numbers.
Well, that was an interesting question with a surprising answer (at least to me, a long-time IDL user): (1) IDL allows comparisons of complex number using equality and inequality, but attempts to compare using GT, LT, etc. cause an illegal exception. (2) IDL sorts complex numbers by the amplitude. It ignores the phase. Numbers with the same amplitude and different phases are randomly ordered depending on their positions in the original array.
Matlab allows comparisons of complex numbers but just compares the real part. I think this is reasonable. Often during a calculation of limited precision one ends up with a complex number when the result is in a "mathematically pure sense" real.
So neither IDL nor Matlab has what I consider the desirable feature that the sort order be unique at least to the extent that equal values wind up next to each other in the sorted array. (Sorting by real value and then, for equal real values, by imaginary value would accomplish that.) Since complex numbers can't be fully ordered there is no single comparison function that can be plugged into a standard sort algorithm and give that result -- it would require a special complex sort algorithm. I guess if neither of the major array processing systems (that I know about) have this property in their complex sorts, it must not be *that* important. And since I've been using IDL for 13 years without discovering that complex greater-than comparisons are illegal, I guess that must not be an important property either (at least to me :-). My conclusion now is similar to Paul Dubois's suggestion -- we should allow equality comparisons and sorting. Beyond that I guess whatever other people want should carry the day, since it clearly doesn't matter to the sorts of things that I do with Numeric! Rick
I think the consistency with Python is less of an issue than it seems. I wasn't aware that add.reduce(x) would generated the same results as the Python version of reduce(add,x) until Perry pointed it out to me.
It is an issue in much of my code, which contains stuff written with NumPy in mind as well as code using only standard Python operations (i.e. reduce()) which might however be applied to array objects. I also use arrays and nested lists interchangeably in many situations (NumPy functions accept nested lists instead of array arguments). Especially in interactive use, nested lists are easier to type.
There are some inconsistencies between Python the language and Numeric because the needs of the Numeric community. For instance, slices create views instead of copies as in Python. This was a correct break with
True, but this affects much fewer programs. Most of my code never modifies arrays after their creation, and then the difference in indexing behaviour doesn't matter.
I don't see choosing axis=-1 as a break with Python -- multi-dimensional arrays are inherently different and used differently than lists of lists
As I said, I often use one or the other as a matter of convenience. I have always considered them similar types with somewhat different specialized behaviour. The most common situation is building up some table with lists (making use of the append function) and then converting the final construct into an array or not, depending on whether this seems advantageous.
in Python. Further, reduce() is a "corner" of the Python language that has been superceded by list comprehensions. Choosing an alternative
List comprehensions work in exactly the same way, by looping over the outermost index.
2) Minimization of code breakage.
Fixes will be necessary for sure, and I wish that wasn't the case. They will be necessary if we choose a consistent interface in either case.
The current interface is not inconsistent. It follows a different logic than what some users expect, but there is a logic behind it. The current rules are the result of lengthy discussions and lengthy tests, though admittedly by a rather small group of people. If you arrange your arrays according to that logic, you almost never need to specify explicit axis arguments.
Choosing axis=0 or axis=-1 will not change what needs to be fixed -- only the function names searched for.
I disagree very much here. The fewer calls are concerned, the fewer mistakes are made, and the fewer modules have to be modified at all. Moreover, the functions that currently use axis=1 are more specialized and more likely to be called in similar contexts. They are also, in my limited experience, less often called with nested list arguments. I don't expect fixes to be as easy as searching for function names and adding an axis argument. Python is a very dynamic language, in which functions are objects like all others. They can be passed as arguments, stored in dictionaries and lists, assigned to variables, etc. In fact, instead of modifying any code, I'd rather write an interface module that emulates the old behaviour, which after all differs only in the default for one argument. The problem with this is that it adds another function call layer, which is rather expensive in Python. Which makes me wonder why we need this discussion at all. It is almost no extra effort to provide two different C modules that provide the same functions with different default arguments, and neither one needs to have any speed penalty.
True. But I can tell you that we're definitely doing something wrong now. We have a superior language that is easier to integrate with legacy code and less expensive than the best competing alternatives. And, though I haven't done a serious market survey, I feel safe in saying we have significantly less than 1% of the potential user base.
I agree with that. But has anyone ever made a serious effort to find out why the whole world is not using Python? In my environment (which is too small to be representative for anything), the main reason is inertia. Most people don't want to invest any time to learn any new language, no matter what the advantages are (they remain hypothetical until you actually start to use the new language). I don't know anyone who has started to use Python and then dropped it because he was not satisfied with some aspect of the language or a library module. On the other hand, I do know projects that collapsed after a split in the user community due to some disagreement over minor details. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen@cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais -------------------------------------------------------------------------------
eric jones wrote:
I think the consistency with Python is less of an issue than it seems. I wasn't aware that add.reduce(x) would generated the same results as the Python version of reduce(add,x) until Perry pointed it out to me. There are some inconsistencies between Python the language and Numeric because the needs of the Numeric community. For instance, slices create views instead of copies as in Python. This was a correct break with consistency in a very utilized area of Python because of efficiency.
<Begin Rant> I think consistency is an issue, particularly for novices. You cite the issue of slices creating views instead of copies as being the correct choice. But this decision is based solely on the perception that views are 'inherently' more efficient than copies and not on reasons of consistency or usability. I (a seasoned user) find view behavior to be annoying and have been caught out on this several times. For example, reversing in-place the elements of any array using slices, i.e. A = A[::-1], will give the wrong answer, unless you explicitly make a copy before doing the assignment. Whereas, copy behavior will do the right thing. I suggest that many novices will be caught out by this and similar examples, as I have been. Copy behavior for slices can be just as efficient as view behavior, if implemented as copy-on-write. The beauty of Python is that it allows the developer to spend much more time on consistency and usability issues than on implementation issues. Sadly, I think much of Numeric development is based solely on implementation issues to the detriment of consistency and usability. I don't have enough experience to definitely say whether axis=0 should be preferred over axis=-1 or vice versa. But is does appear that for the most general cases axis=0 is probably preferred. This is the default for the APL and J programming of which Numeric is based. Should we not continue to follow their lead? It might be nice to see a list of examples where axis=0 is the preferred default and the same for axis=-1. <End Rant> -- Paul Barrett, PhD Space Telescope Science Institute Phone: 410-338-4475 ESS/Science Software Group FAX: 410-338-4767 Baltimore, MD 21218
Paul Barrett
I think consistency is an issue, particularly for novices. You cite ...
Finally a contribution that I can fully agree with :-)
I don't have enough experience to definitely say whether axis=0 should be preferred over axis=-1 or vice versa. But is does appear that for the most general cases axis=0 is probably preferred. This is the default for the APL and J programming of which Numeric is based. Should we not continue to follow their lead? It might be nice to see
This the internal logic I referred to briefly earlier, but I didn't have the time to explain it in more detail. Now I have :-) The basic idea is that an array is seen as an array of array values. The N dimensions are split into two parts, the first N1 dimensions describe the shape of the "total" array, and the remaining N2=N-N1 dimensions describe the shape of the array-valued elements of the array. I suppose some examples will help: - A rank-1 array could be seen either as a vector of scalars (N1 = 1) or as a scalar containing a vector (N1 = 0), in practice there is no difference between these views. - A rank-2 array could be seen as a matrix (N1=2), as a vector of vectors (N1=1) or as a scalar containing a matrix (N1=0). The first and the last come down to the same, but the middle one doesn't. - A discretized vector field (i.e. one 3D vector value for each point on a 3D grid) is represented by a rank-6 array, with N1=3 and N2=3. Array operations are divided into two classes, "structural" and "element" operations. Element operations do something on each individual element of an array, returning a new array with the same "outer" shape, although the element shape may be different. Structural operations work on the outer shape, returning a new array with a possibly different outer shape but the same element shape. The most frequent element operations are addition, multiplication, etc., which work on scalar elements only. They need no axis argument at all. Element operations that work on rank-1 elements have a default axis of -1, I think FFT has been quoted as an example a few times. There are no element operations that work on higher-rank elements, but they are imaginable. A 2D FFT routine would default to axis=-2. Structural operations, which are by far the most frequent after scalar element operations, default to axis=0. They include reduction and accumulation, sorting, selection (take, repeat, ...) and some others. I hope this clarifies the choice of default axis arguments in the current NumPy. It is most definitely not arbitrary or accidental. If you follow the data layout principles explained above, you always never need to specify an explicit axis argument. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen@cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais -------------------------------------------------------------------------------
The users of Numeric at PCMDI found the 'view' semantics so annoying that they insisted their CS staff write a separate version of Numeric just to avoid it. We have since gotten out of that mess but that is the reason MA has copy semantics. Again, this is another issue where one is fighting over the right to 'own' the operator notation. I believe that copy semantics should win this one because it is a **proven fact** that scientists trip over it, and it is consistent with Python list semantics. People who really need view semantics could get it as previously suggested by someone, with something like x.sub[10:12, :]. There are now dead horses all over the landscape, and I for one am going to shut up.
-----Original Message----- From: numpy-discussion-admin@lists.sourceforge.net [mailto:numpy-discussion-admin@lists.sourceforge.net] On Behalf Of Paul Barrett Sent: Wednesday, June 12, 2002 8:54 AM To: numpy-discussion Subject: Re: [Numpy-discussion] RE: default axis for numarray
eric jones wrote:
I think the consistency with Python is less of an issue
than it seems.
I wasn't aware that add.reduce(x) would generated the same results as the Python version of reduce(add,x) until Perry pointed it out to me. There are some inconsistencies between Python the language and Numeric because the needs of the Numeric community. For instance, slices create views instead of copies as in Python. This was a correct break with consistency in a very utilized area of Python because of efficiency.
<Begin Rant>
I think consistency is an issue, particularly for novices. You cite the issue of slices creating views instead of copies as being the correct choice. But this decision is based solely on the perception that views are 'inherently' more efficient than copies and not on reasons of consistency or usability. I (a seasoned user) find view behavior to be annoying and have been caught out on this several times. For example, reversing in-place the elements of any array using slices, i.e. A = A[::-1], will give the wrong answer, unless you explicitly make a copy before doing the assignment. Whereas, copy behavior will do the right thing. I suggest that many novices will be caught out by this and similar examples, as I have been. Copy behavior for slices can be just as efficient as view behavior, if implemented as copy-on-write.
The beauty of Python is that it allows the developer to spend much more time on consistency and usability issues than on implementation issues. Sadly, I think much of Numeric development is based solely on implementation issues to the detriment of consistency and usability.
I don't have enough experience to definitely say whether axis=0 should be preferred over axis=-1 or vice versa. But is does appear that for the most general cases axis=0 is probably preferred. This is the default for the APL and J programming of which Numeric is based. Should we not continue to follow their lead? It might be nice to see a list of examples where axis=0 is the preferred default and the same for axis=-1.
<End Rant>
-- Paul Barrett, PhD Space Telescope Science Institute Phone: 410-338-4475 ESS/Science Software Group FAX: 410-338-4767 Baltimore, MD 21218
_______________________________________________________________
Sponsored by: ThinkGeek at http://www.ThinkGeek.com/ _______________________________________________ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
<Paul Dubois writes>:
There are now dead horses all over the landscape, and I for one am going to shut up.
Not enough dead horses for me :-). But seriously, I would like to hear from others about this issue (I already knew what Paul, Paul, Eric, Travis and Konrad felt about this before it started up). You can either post to the mailing list or email directly if you are the shy, retiring type. Perry
participants (9)
-
eric jones
-
Konrad Hinsen
-
Paul Barrett
-
Paul F Dubois
-
Pearu Peterson
-
Perry Greenfield
-
Rick White
-
Scott Ransom
-
Travis Oliphant