[Numpy-discussion] RE: default axis for numarray
eric jones
eric at enthought.com
Mon Jun 10 16:16:03 EDT 2002
So one contentious issue a day isn't enough, huh? :-)
> An issue that has been raised by scipy (most notably Eric Jones
> and Travis Oliphant) has been whether the default axis used by
> various functions should be changed from the current Numeric
> default. This message is not directed at determining whether we
> should change the current Numeric behavior for Numeric, but whether
> numarray should adopt the same behavior as the current Numeric.
>
> To be more specific, certain functions and methods, such as
> add.reduce(), operate by default on the first axis. For example,
> if x is a 2 x 10 array, then add.reduce(x) results in a
> 10 element array, where elements in the first dimension has
> been summed over rather than the most rapidly varying dimension.
>
> >>> x = arange(20)
> >>> x.shape = (2,10)
> >>> x
> array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
> [[10, 11, 12, 13, 14, 15, 16, 17, 18, 19]])
> >>> add.reduce(x)
> array([10, 12, 14, 16, 18, 20, 22, 24, 26, 28])
The issue here is both consistency across a library and speed.
>From the numpy.pdf, Numeric looks to have about 16 functions using
axis=0 (or index=0 which should really be axis=0) and, counting FFT,
about 10 functions using axis=-1. To this day, I can't remember which
functions use which and have resorted to explicitly using axis=-1 in my
code. Unfortunately, many of the Numeric functions that should still
don't take axis as a keyword, so you and up just inserting -1 in the
argument list (but this is a different issue -- it just needs to be
fixed).
SciPy always uses axis=-1 for operations. There are 60+ functions with
this convention. Choosing -1 offers the best cache use and therefore
should be more efficient. Defaulting to the fastest behavior is
convenient because new users don't need any special knowledge of
Numeric's implementation to get near peak performance. Also, there is
never a question about which axis is used for calculations.
When using SciPy and Numeric, their function sets are completely
co-mingled. When adding SciPy and Numeric's function counts together,
it is 70 to 16 for axis=-1 vs. axis=0. Even though SciPy chose a
standard, it is impossible for the interface to become intuitive because
of the exceptions to the rule from Numeric.
So here what I think. All functions should default to the same axis so
that the interface to common functions can become second nature for new
users and experts alike. Further, the chosen axis should be the most
efficient for the most cases.
There are actually a few functions that, taken in isolation, I think
should have axis=0. take() is an example. But, for the sake of
consistency, it too should use axis=-1.
It has been suggested to recommend that new users always specify axis=?
as a keyword in functions that require an axis argument. This might be
fine when writing modules, but always having to type:
>>> sum(a,axis=-1)
in command line mode is a real pain.
Just a point about the larger picture here... The changes we're
discussing are intended to clean up the warts on Numeric -- and, as good
as it is overall, these are warts in terms of usability. Interfaces
should be consistent across a library. The return types from functions
should be consistent regardless of input type (or shape). Default
arguments to the same keyword should also be consistent across
functions. Some issues are left to debate (i.e. using axis=-1 or axis=0
as default, returning arrays or scalars from Numeric functions and
indexing), but the choice made should be applied as consistently as
possible.
We should also strive to make it as easy as possible to write generic
functions that work for all array types (Int, Float,Float32,Complex,
etc.) -- yet another debate to come.
Changes are going to create some backward incompatibilities and that is
definitely a bummer. But some changes are also necessary before the
community gets big. I know the community is already reasonable size,
but I also believe, based on the strength of Python, Numeric, and
libraries such as Scientific and SciPy, the community can grow by 2
orders of magnitude over the next five years. This kind of growth can't
occur if only savvy developers see the benefits of the elegant language.
It can only occur if the general scientist see Python as a compelling
alternative to Matlab (and IDL) as their day-in/day-out command line
environment for scientific/engineering analysis. Making the interface
consistent is one of several steps to making Python more attractive to
this community.
Whether the changes made for numarray should be migrated back into
Numeric is an open question. I think they should, but see Konrad's
counterpoint. I'm willing for SciPy to be the intermediate step in the
migration between the two, but also think that is sub-optimal.
>
> Some feel that is contrary to expectations that the least rapidly
> varying dimension should be operated on by default. There are
> good arguments for both sides. For example, Konrad Hinsen has
> argued that the current behavior is most compatible for behavior
> of other Python sequences. For example,
>
> >>> sum = 0
> >>> for subarr in x:
> sum += subarr
>
> acts on the first axis in effect. Likewise
>
> >>> reduce(add, x)
>
> does likewise. In this sense, Numeric is currently more consistent
> with Python behavior. However, there are other functions that
> operate on the most rapidly varying dimension. Unfortunately
> I cannot currently access my old mail, but I think the rule
> that was proposed under this argument was that if the 'reduction'
> operation was of a structural kind, the first dimension is used.
> If the reduction or processing step is 'time-series' oriented
> (e.g., FFT, convolve) then the last dimension is the default.
> On the other hand, some feel it would be much simpler to understand
> if the last axis was the default always.
>
> The question is whether there is a consensus for one approach or
> the other. We raised this issue at a scientific Birds-of-a-Feather
> session at the last Python Conference. The sense I got there was
> that most were for the status quo, keeping the behavior as it is
> now. Is the same true here? In the absence of consensus or a
> convincing majority, we will keep the behavior the same for backward
> compatibility purposes.
Obviously, I'm more opinionated about this now than I was then. I
really urge you to consider using axis=-1 everywhere. SciPy is not the
only scientific library, but I think it adds the most functions with a
similar signature (the stats module is full of them). I very much hope
for a consistent interface across all of Python's scientific functions
because command line users aren't going to care whether sum() and
kurtosis() come from different libraries, they just want them to behave
consistently.
eric
>
> Perry
More information about the NumPy-Discussion
mailing list