[Numpy-discussion] Numarray design announcement

Thu Jul 25 15:18:04 EDT 2002

"Paul F Dubois" <paul at pfdubois.com> writes:

> 
> During recent months there has been a lot of discussion about Numarray
> and whether or not it should differ from Numeric in certain ways. We
> have reviewed this lengthy discussion and come to some conclusions about
> what we plan to do. The discussion has been valuable in that it took a
> whole new "generation" back through the considerations that the
> "founding fathers" debated when Numeric Python was designed.
[...] 
> Decisions
> 
> Numarray will have the same Python interface as Numeric except for the
> exceptions discussed below. 
[...] 
> 2. Currently, if the result of an index operation x[i] results in a
> scalar result, the result is converted to a similar Python type. For
> example, the result of array([1,2,3])[1] is the Python integer 2. This
> will be changed so that the result of an index operation on a Numarray
> array is always a Numarray array. Scalar results will become rank-zero
> arrays (i.e., shape () ).
> 
[...]
> 
> 4. The Numarray version of MA will no longer have copy semantics on
> indexing but instead will be consistent with Numarray. (The decision to
> make MA differ in this regards was due to a need for CDAT to be backward
> compatible with a local variant of Numeric; the CDAT user community no
> longer feels this was necessary).
[...]

As one of the people who argued for interface changes in numarray (mainly copy
semantics for slicing), let me say that I welcome this announcement which
clarifies many issues. Although I still believe that copy behavior would be
preferable in principle, I think that continuity and backwards compatibility
to Numeric is a sufficient reason to stick to the old behavior (now that
numarray strives to be largely compatible) [1]. In a similar vain I also greatly
welcome the change to view semantics in MA, because I feel that internal
consistency is vital.

Apart from being a heavy Numeric user, these interface issues are also quite
important to me because I have been working for some time on a fully-featured
matrix [2] class which I wanted to be both

a) compatible to Numeric and numarray (so that it would ideally make no
   difference to the user which of the 2 libraries he'd be using as a
   "backend" to the matrix class).

b) consistent in usage to numarray's interface wherever feasible (i.e. not too
   much of a compromise on usability).

This turned out to be much more of a hassle than I would have anticipated,
because contrary to what the compatibility section of the manual seemed to
suggest I found numarray to be incompatible in a variety of ways (even making
it impossible to write *forward* compatible code without writing additional
wrapping functions). Just as an example, there was no simple way that would
work across both versions to do something as common as creating e.g. an int
array (with both parameter names and positions differing):

Numeric (21):       array(sequence, typecode=None, copy=1, savespace=0)
numarray (0.3.3?) : array(buffer=None, shape=None, type=None)

As for b) this obviously turned out to be a moving target, but I hope that now
the final shape of things is getting reasonably clear and I'm now for example
determined to have view slicing behavior for my matrix class, too.

Nonetheless, for me a few issues still remain.

Most importantly, numarray doesn't provide the same degree of polymorphism as
Numeric. One of the chief reasons given as to why Numerics design is based
around functions rather than methods is that it enables greater generality
(e.g. allowing to ``sum`` over all sorts of sequence types). Consequently the
role of methods and attributes was largely limited to functionality that only
made sense for array objects and special methods. This is more than just a
neat convinience -- because of the resulting polymorphism it is easy to write
fairly general code and define new kinds of numeric classes that can
seamlessly be passed to Numeric functions (e.g. one can also ``sum``
Matrix'es).

I find it highly undesirable that numarray apparently doesn't follow this
design rationale and the division of labour between functions and
methods/attributes has been blured (or so it appears to me -- maybe this is
some lack of insight on my part).  That numarray versions before 0.3.4 were
missing functions such as ``shape`` (which is also quite handy for other
sequence types) was largely an inconvenience, but the fact that numarray
function generally only operate on scalars, ``tuple``s and ``list``s (apart
from obviously numarray.array's) is in my eyes a significant shortcoming.

In contrast, Numeric functions would operate on any type that had an __array__
method to return an array representation of itself. The explicit checking for
a type that numarray uses (via constructs à la type(a) == types.ListType)
flies in the face of standard python sensibilities and places arbitrarily
limits on the kinds of objects that numarray users can conviniently work with
and places a significant hurdle for creating new kinds of numerical objects.

For example, the design of my matrix class depends on the fact that Numeric
functions also accept objects with __array__ methods (such as my matrix
class). Even if I invested the substantial amount of work that would be needed
to redesign a less general version that wouldn't rely on this property, one of
the key virtues of my class, namely the ability to transparently replace
Numeric.array's in most cases where they are used as matrices would be
lost. These two reasons would presumably be sufficient for me not to switch to
numarray if I can at all avoid it, so I really hope that there numarray will
also grow an __array__ protocol or somethign equivalent.

This is the only point that is really vital to me, but there are others that
I'd rather see reconsidered. As I said, I liked the division of labor between
functions and methods/attributes in Numeric and the motivations behind it, as
far as I understand them. numarray arrays, however, have grown methods like
``argsort`` and ``diagonal`` that seem somewhat unmotivated to me (and some of
which cause problems for my matrix class). Similarly, why is there a e.g. a
``.rank`` attribute but a ``.type()`` method? If anything one would expect
type to be an attribute and rank a method, since the type is actually some
intrinsic property that needs to be stored (and could even be plausibly
assigned to, with results like an ``astype`` call) whereas ``size`` and
``rank`` have no "real" existence as they are only computed from the shape and
modifying them makes no sense.

TMTOWTDI is the road to perl, so I'd really prefer to avoid duplicate
functionality a la ``rank(a)`` and ``a.rank`` and generally reserve attributes
and methods to array specific functionality.

One area where TMTOWTDI seems to have run amok (several ways to do something
but IMHO all broken) are flattened representations of arrays. All these
expressions aim to produce a flattened version of ``a``:

``ravel(a)``, ``a.ravel()``, ``a.getflat()``/ ``a.flat``

`Aim` in this context is some sort of euphemism -- the only one for which it
is possible to determine at compile time that it will do anything apart from
raising an exception is ``ravel(a)`` -- not that one could know *what* it will
do before the code is actually run (return a flattened copy of a or a
flattened view), but never mind. Yuck. I think this really needs fixing
(deprecating, rather then removing or changing incompatibly where felt
necessary).

Something else, which I however consider as less important: is it really
necessary to have both 'type' and 'typecode'?  Wouldn't it be enough to just
stick with typecode, along the following lines (potentially issuing
deprecation warnings where appropriate):

  a.typecode()

returns a type object (e.g. Float32).

  array([1,2,3], typecode=Float32)

behaves the same as 

  array([1,2,3], typecode='d')

Float32 etc. are already defined in Numeric so it's easy to write
forward-compatible code and although hunting down instances of 

  if a.typecode() == 'd':

presumably wouldn't be that difficult, incompatibility could most likely
almost be eliminated by making ``Float32 == 'd'`` return true.

Sticking to the old name typecode also has the advantage that it is fairly
unique and unambiguous (just try grep'ing for type vs. typecode). 

I must that apart from the switch to type objects, I don't fully understand
the differences in numeric types in old Numeric and numarray and the
motivation behind them. As far as I can see the emphasis with Numeric was to
keep flexible to different hardware and increasing word sizes (i.e. to only
guarantee minimum precision) and provide some reasonable "default" size for
each type (e.g. `Float` being a double precision [3]). This approach is maybe
somewhat similar to python core (floats and ints can have different sizes,
depending on the underlying platform). In numarray the emphasis seems to have
shifted on guaranteeing the actual size in memory (if in a few years time most
calculations are done with 128bit precision than that's maybe not such a good
idea, but I have no clue how likely this is to happen).

Is this shift of emphasis is also responsible for the decision to
have indexing operations always return arrays rather than scalars (including
ones defined by numarray in cases where there is no plain-python equivalent)?

Will all other functions (e.g. min) continue to return scalars?

[BTW can anyone explain to me the difference between Int and Int32 (typecodes
'i' and 'l')?]

Anyway, my apologies if I come across as too negative or if some the points
are misinformed. I really think that the recent changes to numarray and this
announcment are great step forward to a smooth transition of the whole
community from Numeric to numarray which will play an important role in
consolidating python's role in the scientific computing.

night,

alex

Footnotes: 
[1]  I think it might be beneficial, however, to add an explicitly note to the
     manual that alerts users to the fact that small slices can keep alive
     very large arrays, because I am under the impression that this is not
     immediately obvious to everyone and can cause puzzling problems.

[2]  I moaned on this list some months ago that doing linear algebra with
     Numeric array's was often cumbersome and inefficient (and the Matrix
     class that already comes with Numeric is rather limited). My (currently
     alpha) matrix class attempts to address these issues and also provides a
     much more flexible 'plugable' output formating (matlab-like, amongst
     others, which I guess many people will find much more readable; but the
     standard array-like formating is also available).

[3]  As an aside: maybe ``type="Float"`` in numarray should therefore *not* be
     equivalent to ``type=Float32`` but to ``type=Float64``, given that these
     strings seem to just be there for backwards compatibility?

-- 
Alexander Schmolck     Postgraduate Research Student
                       Department of Computer Science
                       University of Exeter
A.Schmolck at gmx.net     http://www.dcs.ex.ac.uk/people/aschmolc/