![](https://secure.gravatar.com/avatar/56475d2e8acb48b4308f609982f94440.jpg?s=120&d=mm&r=g)
At numpy.sf.net you will find a posting from Perry Greenfield and I detailing the design decisions we have taken with respect to Numarray. What follows is the text of that message without the formatting. We ask for your understanding about those decisions that differ from the ones you might prefer. Numarray's Design Paul F. Dubois and Perry Greenfield Numarray is the new implementation of the Numeric Python extension. It is our intention that users will change as rapidly as possible to the new module when we decide it is ready. The present Numeric Python team will cease supporting Numeric after a short transition period. During recent months there has been a lot of discussion about Numarray and whether or not it should differ from Numeric in certain ways. We have reviewed this lengthy discussion and come to some conclusions about what we plan to do. The discussion has been valuable in that it took a whole new "generation" back through the considerations that the "founding fathers" debated when Numeric Python was designed. There are literally tens of thousands of Numerical Python users. These users may represent only a tiny percentage of potential users but they are real users today with real code that they have written, and breaking that code would represent real harm to real people. Most of the issues discussed recently were discussed at length when Numeric was first designed. Some decisions taken then represent a choice that was simply a choice among valid alternatives. Nevertheless, the choice was made, and to arbitrarily now make a different choice would be difficult to justify. In arguing about Python's indentation, we often see heart-felt arguments from opponents who have sincere reasons for feeling as they do. However, many of the pitfalls they point to do not seem to actually occur in real life very often. We feel the same way about many arguments about Numeric Python. The view / copy argument, for example, claims that beginners will make errors with view semantics. Well, some do, but not very often, and not twice. It is just one of many differences that users need to adapt to when learning an entity-object model such as Python's when they are used to variable semantics such as in Fortran or C. Similarly, we do not receive massive reports of confusion about differing default values for the axis keyword -- there was a rationale for the way it is now, and although one could propose a different rationale for a different choice, it would be just a choice. Decisions Numarray will have the same Python interface as Numeric except for the exceptions discussed below. 1. The Numarray C API includes a compatibility layer consisting of some of the members of the Numeric C API. For details on compatibility at the C level see http://telia.dl.sourceforge.net/sourceforge/numpy/numarray.pdf , pdf pages 78-81. Since no formal decision was ever made about what parts of the Numeric C header file were actually intended to be publicly available, do not expect complete emulation. Numarray's current view of arrays in C, using either native or emulation C-APIs, is that array data can be mutated, but array properties cannot. Thus, an existing Numeric extension function which tries to change the shape or strides of an array in C is more of a porting challenge, possibly requiring a python wrapper. Depending on what kind of optimization we do, this restriction might be lifted. For the Numeric extensions already ported to Numarray (RandomArray, LinearAlgebra, FFT), none of this was an issue. 2. Currently, if the result of an index operation x[i] results in a scalar result, the result is converted to a similar Python type. For example, the result of array([1,2,3])[1] is the Python integer 2. This will be changed so that the result of an index operation on a Numarray array is always a Numarray array. Scalar results will become rank-zero arrays (i.e., shape () ). 3. Currently, binary operations involving Numeric arrays and Python scalars uses the precision of the Python scalar to help determine the precision of the result. In Numarray, the precision of the array will have precedence in determining the precision of the outcome. Full details are available in the Numarray documention. 4. The Numarray version of MA will no longer have copy semantics on indexing but instead will be consistent with Numarray. (The decision to make MA differ in this regards was due to a need for CDAT to be backward compatible with a local variant of Numeric; the CDAT user community no longer feels this was necessary). Some explanation about the scalar change is in order. Currently, much coding in Numeric-based applications must be devoted to handling the fact that after an index operation, the programmer can not assume that the result is an array. So, what are the consequences of change? A rank-zero array will interact as expected with most other parts of Python. When it does not, the most likely result is a type error. For example, let x = array([1,2,3]). Then [1,2,3][x[0]] currently produces the result 2. With the change, it would produce a type error unless a change is made to the Python core (currently under discussion). But x[x[0]] would still work because we have control of that. In short, we do not think this change will break much code and it will prevent the writing of more code that is either broken or difficult to write correctly.
![](https://secure.gravatar.com/avatar/fcc1e9dd91d747f8cc87ebae8d7bc7a3.jpg?s=120&d=mm&r=g)
Paul F Dubois wrote:
Numarray's Design Paul F. Dubois and Perry Greenfield
a very nice design, for a lot of challenging decisions
i have a c extension that does this, but only during "creation time" of the array. i'm hoping there can be some way to do this from C. i need to create a new array from a block of numbers that aren't contiguous... /* roughly snipped code */ dim[0] = myimg->w; dim[1] = myimg->h; dim[2] = 3; /*r,g,b*/ array = PyArray_FromDimsAndData(3, dim, PyArray_UBYTE, startpixel); array->flags = OWN_DIMENSIONS|OWN_STRIDES; array->strides[2] = pixelstep; array->strides[1] = myimg->pitch; array->strides[0] = myimg->format->BytesPerPixel; array->base = myimg_object; note this data is image data, and i am "reorienting" it so that the first index is X and the second index is Y. plus i need to account for an image pitch, where the rows are not exactly the same width as the number of pixels. also, i am also changing the "base" field, since the data for this array lives inside another image object of course, once the array is created, i pass it off to the user and never touch these fields again, so perhaps something like this will work in the new numarray? if not, i'm eager to start my petition for a "PyArray_FromDimsAndDataAndStrides" function, and also a way to assign the "base" as well. i'm looking forward to the new numarray, looks very exciting.
![](https://secure.gravatar.com/avatar/eb281ac8437ba6df4ef5f0f9686e7c3e.jpg?s=120&d=mm&r=g)
"Paul F Dubois" <paul@pfdubois.com> writes:
[...]
[...]
[...] As one of the people who argued for interface changes in numarray (mainly copy semantics for slicing), let me say that I welcome this announcement which clarifies many issues. Although I still believe that copy behavior would be preferable in principle, I think that continuity and backwards compatibility to Numeric is a sufficient reason to stick to the old behavior (now that numarray strives to be largely compatible) [1]. In a similar vain I also greatly welcome the change to view semantics in MA, because I feel that internal consistency is vital. Apart from being a heavy Numeric user, these interface issues are also quite important to me because I have been working for some time on a fully-featured matrix [2] class which I wanted to be both a) compatible to Numeric and numarray (so that it would ideally make no difference to the user which of the 2 libraries he'd be using as a "backend" to the matrix class). b) consistent in usage to numarray's interface wherever feasible (i.e. not too much of a compromise on usability). This turned out to be much more of a hassle than I would have anticipated, because contrary to what the compatibility section of the manual seemed to suggest I found numarray to be incompatible in a variety of ways (even making it impossible to write *forward* compatible code without writing additional wrapping functions). Just as an example, there was no simple way that would work across both versions to do something as common as creating e.g. an int array (with both parameter names and positions differing): Numeric (21): array(sequence, typecode=None, copy=1, savespace=0) numarray (0.3.3?) : array(buffer=None, shape=None, type=None) As for b) this obviously turned out to be a moving target, but I hope that now the final shape of things is getting reasonably clear and I'm now for example determined to have view slicing behavior for my matrix class, too. Nonetheless, for me a few issues still remain. Most importantly, numarray doesn't provide the same degree of polymorphism as Numeric. One of the chief reasons given as to why Numerics design is based around functions rather than methods is that it enables greater generality (e.g. allowing to ``sum`` over all sorts of sequence types). Consequently the role of methods and attributes was largely limited to functionality that only made sense for array objects and special methods. This is more than just a neat convinience -- because of the resulting polymorphism it is easy to write fairly general code and define new kinds of numeric classes that can seamlessly be passed to Numeric functions (e.g. one can also ``sum`` Matrix'es). I find it highly undesirable that numarray apparently doesn't follow this design rationale and the division of labour between functions and methods/attributes has been blured (or so it appears to me -- maybe this is some lack of insight on my part). That numarray versions before 0.3.4 were missing functions such as ``shape`` (which is also quite handy for other sequence types) was largely an inconvenience, but the fact that numarray function generally only operate on scalars, ``tuple``s and ``list``s (apart from obviously numarray.array's) is in my eyes a significant shortcoming. In contrast, Numeric functions would operate on any type that had an __array__ method to return an array representation of itself. The explicit checking for a type that numarray uses (via constructs à la type(a) == types.ListType) flies in the face of standard python sensibilities and places arbitrarily limits on the kinds of objects that numarray users can conviniently work with and places a significant hurdle for creating new kinds of numerical objects. For example, the design of my matrix class depends on the fact that Numeric functions also accept objects with __array__ methods (such as my matrix class). Even if I invested the substantial amount of work that would be needed to redesign a less general version that wouldn't rely on this property, one of the key virtues of my class, namely the ability to transparently replace Numeric.array's in most cases where they are used as matrices would be lost. These two reasons would presumably be sufficient for me not to switch to numarray if I can at all avoid it, so I really hope that there numarray will also grow an __array__ protocol or somethign equivalent. This is the only point that is really vital to me, but there are others that I'd rather see reconsidered. As I said, I liked the division of labor between functions and methods/attributes in Numeric and the motivations behind it, as far as I understand them. numarray arrays, however, have grown methods like ``argsort`` and ``diagonal`` that seem somewhat unmotivated to me (and some of which cause problems for my matrix class). Similarly, why is there a e.g. a ``.rank`` attribute but a ``.type()`` method? If anything one would expect type to be an attribute and rank a method, since the type is actually some intrinsic property that needs to be stored (and could even be plausibly assigned to, with results like an ``astype`` call) whereas ``size`` and ``rank`` have no "real" existence as they are only computed from the shape and modifying them makes no sense. TMTOWTDI is the road to perl, so I'd really prefer to avoid duplicate functionality a la ``rank(a)`` and ``a.rank`` and generally reserve attributes and methods to array specific functionality. One area where TMTOWTDI seems to have run amok (several ways to do something but IMHO all broken) are flattened representations of arrays. All these expressions aim to produce a flattened version of ``a``: ``ravel(a)``, ``a.ravel()``, ``a.getflat()``/ ``a.flat`` `Aim` in this context is some sort of euphemism -- the only one for which it is possible to determine at compile time that it will do anything apart from raising an exception is ``ravel(a)`` -- not that one could know *what* it will do before the code is actually run (return a flattened copy of a or a flattened view), but never mind. Yuck. I think this really needs fixing (deprecating, rather then removing or changing incompatibly where felt necessary). Something else, which I however consider as less important: is it really necessary to have both 'type' and 'typecode'? Wouldn't it be enough to just stick with typecode, along the following lines (potentially issuing deprecation warnings where appropriate): a.typecode() returns a type object (e.g. Float32). array([1,2,3], typecode=Float32) behaves the same as array([1,2,3], typecode='d') Float32 etc. are already defined in Numeric so it's easy to write forward-compatible code and although hunting down instances of if a.typecode() == 'd': presumably wouldn't be that difficult, incompatibility could most likely almost be eliminated by making ``Float32 == 'd'`` return true. Sticking to the old name typecode also has the advantage that it is fairly unique and unambiguous (just try grep'ing for type vs. typecode). I must that apart from the switch to type objects, I don't fully understand the differences in numeric types in old Numeric and numarray and the motivation behind them. As far as I can see the emphasis with Numeric was to keep flexible to different hardware and increasing word sizes (i.e. to only guarantee minimum precision) and provide some reasonable "default" size for each type (e.g. `Float` being a double precision [3]). This approach is maybe somewhat similar to python core (floats and ints can have different sizes, depending on the underlying platform). In numarray the emphasis seems to have shifted on guaranteeing the actual size in memory (if in a few years time most calculations are done with 128bit precision than that's maybe not such a good idea, but I have no clue how likely this is to happen). Is this shift of emphasis is also responsible for the decision to have indexing operations always return arrays rather than scalars (including ones defined by numarray in cases where there is no plain-python equivalent)? Will all other functions (e.g. min) continue to return scalars? [BTW can anyone explain to me the difference between Int and Int32 (typecodes 'i' and 'l')?] Anyway, my apologies if I come across as too negative or if some the points are misinformed. I really think that the recent changes to numarray and this announcment are great step forward to a smooth transition of the whole community from Numeric to numarray which will play an important role in consolidating python's role in the scientific computing. night, alex Footnotes: [1] I think it might be beneficial, however, to add an explicitly note to the manual that alerts users to the fact that small slices can keep alive very large arrays, because I am under the impression that this is not immediately obvious to everyone and can cause puzzling problems. [2] I moaned on this list some months ago that doing linear algebra with Numeric array's was often cumbersome and inefficient (and the Matrix class that already comes with Numeric is rather limited). My (currently alpha) matrix class attempts to address these issues and also provides a much more flexible 'plugable' output formating (matlab-like, amongst others, which I guess many people will find much more readable; but the standard array-like formating is also available). [3] As an aside: maybe ``type="Float"`` in numarray should therefore *not* be equivalent to ``type=Float32`` but to ``type=Float64``, given that these strings seem to just be there for backwards compatibility? -- Alexander Schmolck Postgraduate Research Student Department of Computer Science University of Exeter A.Schmolck@gmx.net http://www.dcs.ex.ac.uk/people/aschmolc/
![](https://secure.gravatar.com/avatar/fcc1e9dd91d747f8cc87ebae8d7bc7a3.jpg?s=120&d=mm&r=g)
Paul F Dubois wrote:
Numarray's Design Paul F. Dubois and Perry Greenfield
a very nice design, for a lot of challenging decisions
i have a c extension that does this, but only during "creation time" of the array. i'm hoping there can be some way to do this from C. i need to create a new array from a block of numbers that aren't contiguous... /* roughly snipped code */ dim[0] = myimg->w; dim[1] = myimg->h; dim[2] = 3; /*r,g,b*/ array = PyArray_FromDimsAndData(3, dim, PyArray_UBYTE, startpixel); array->flags = OWN_DIMENSIONS|OWN_STRIDES; array->strides[2] = pixelstep; array->strides[1] = myimg->pitch; array->strides[0] = myimg->format->BytesPerPixel; array->base = myimg_object; note this data is image data, and i am "reorienting" it so that the first index is X and the second index is Y. plus i need to account for an image pitch, where the rows are not exactly the same width as the number of pixels. also, i am also changing the "base" field, since the data for this array lives inside another image object of course, once the array is created, i pass it off to the user and never touch these fields again, so perhaps something like this will work in the new numarray? if not, i'm eager to start my petition for a "PyArray_FromDimsAndDataAndStrides" function, and also a way to assign the "base" as well. i'm looking forward to the new numarray, looks very exciting.
![](https://secure.gravatar.com/avatar/eb281ac8437ba6df4ef5f0f9686e7c3e.jpg?s=120&d=mm&r=g)
"Paul F Dubois" <paul@pfdubois.com> writes:
[...]
[...]
[...] As one of the people who argued for interface changes in numarray (mainly copy semantics for slicing), let me say that I welcome this announcement which clarifies many issues. Although I still believe that copy behavior would be preferable in principle, I think that continuity and backwards compatibility to Numeric is a sufficient reason to stick to the old behavior (now that numarray strives to be largely compatible) [1]. In a similar vain I also greatly welcome the change to view semantics in MA, because I feel that internal consistency is vital. Apart from being a heavy Numeric user, these interface issues are also quite important to me because I have been working for some time on a fully-featured matrix [2] class which I wanted to be both a) compatible to Numeric and numarray (so that it would ideally make no difference to the user which of the 2 libraries he'd be using as a "backend" to the matrix class). b) consistent in usage to numarray's interface wherever feasible (i.e. not too much of a compromise on usability). This turned out to be much more of a hassle than I would have anticipated, because contrary to what the compatibility section of the manual seemed to suggest I found numarray to be incompatible in a variety of ways (even making it impossible to write *forward* compatible code without writing additional wrapping functions). Just as an example, there was no simple way that would work across both versions to do something as common as creating e.g. an int array (with both parameter names and positions differing): Numeric (21): array(sequence, typecode=None, copy=1, savespace=0) numarray (0.3.3?) : array(buffer=None, shape=None, type=None) As for b) this obviously turned out to be a moving target, but I hope that now the final shape of things is getting reasonably clear and I'm now for example determined to have view slicing behavior for my matrix class, too. Nonetheless, for me a few issues still remain. Most importantly, numarray doesn't provide the same degree of polymorphism as Numeric. One of the chief reasons given as to why Numerics design is based around functions rather than methods is that it enables greater generality (e.g. allowing to ``sum`` over all sorts of sequence types). Consequently the role of methods and attributes was largely limited to functionality that only made sense for array objects and special methods. This is more than just a neat convinience -- because of the resulting polymorphism it is easy to write fairly general code and define new kinds of numeric classes that can seamlessly be passed to Numeric functions (e.g. one can also ``sum`` Matrix'es). I find it highly undesirable that numarray apparently doesn't follow this design rationale and the division of labour between functions and methods/attributes has been blured (or so it appears to me -- maybe this is some lack of insight on my part). That numarray versions before 0.3.4 were missing functions such as ``shape`` (which is also quite handy for other sequence types) was largely an inconvenience, but the fact that numarray function generally only operate on scalars, ``tuple``s and ``list``s (apart from obviously numarray.array's) is in my eyes a significant shortcoming. In contrast, Numeric functions would operate on any type that had an __array__ method to return an array representation of itself. The explicit checking for a type that numarray uses (via constructs à la type(a) == types.ListType) flies in the face of standard python sensibilities and places arbitrarily limits on the kinds of objects that numarray users can conviniently work with and places a significant hurdle for creating new kinds of numerical objects. For example, the design of my matrix class depends on the fact that Numeric functions also accept objects with __array__ methods (such as my matrix class). Even if I invested the substantial amount of work that would be needed to redesign a less general version that wouldn't rely on this property, one of the key virtues of my class, namely the ability to transparently replace Numeric.array's in most cases where they are used as matrices would be lost. These two reasons would presumably be sufficient for me not to switch to numarray if I can at all avoid it, so I really hope that there numarray will also grow an __array__ protocol or somethign equivalent. This is the only point that is really vital to me, but there are others that I'd rather see reconsidered. As I said, I liked the division of labor between functions and methods/attributes in Numeric and the motivations behind it, as far as I understand them. numarray arrays, however, have grown methods like ``argsort`` and ``diagonal`` that seem somewhat unmotivated to me (and some of which cause problems for my matrix class). Similarly, why is there a e.g. a ``.rank`` attribute but a ``.type()`` method? If anything one would expect type to be an attribute and rank a method, since the type is actually some intrinsic property that needs to be stored (and could even be plausibly assigned to, with results like an ``astype`` call) whereas ``size`` and ``rank`` have no "real" existence as they are only computed from the shape and modifying them makes no sense. TMTOWTDI is the road to perl, so I'd really prefer to avoid duplicate functionality a la ``rank(a)`` and ``a.rank`` and generally reserve attributes and methods to array specific functionality. One area where TMTOWTDI seems to have run amok (several ways to do something but IMHO all broken) are flattened representations of arrays. All these expressions aim to produce a flattened version of ``a``: ``ravel(a)``, ``a.ravel()``, ``a.getflat()``/ ``a.flat`` `Aim` in this context is some sort of euphemism -- the only one for which it is possible to determine at compile time that it will do anything apart from raising an exception is ``ravel(a)`` -- not that one could know *what* it will do before the code is actually run (return a flattened copy of a or a flattened view), but never mind. Yuck. I think this really needs fixing (deprecating, rather then removing or changing incompatibly where felt necessary). Something else, which I however consider as less important: is it really necessary to have both 'type' and 'typecode'? Wouldn't it be enough to just stick with typecode, along the following lines (potentially issuing deprecation warnings where appropriate): a.typecode() returns a type object (e.g. Float32). array([1,2,3], typecode=Float32) behaves the same as array([1,2,3], typecode='d') Float32 etc. are already defined in Numeric so it's easy to write forward-compatible code and although hunting down instances of if a.typecode() == 'd': presumably wouldn't be that difficult, incompatibility could most likely almost be eliminated by making ``Float32 == 'd'`` return true. Sticking to the old name typecode also has the advantage that it is fairly unique and unambiguous (just try grep'ing for type vs. typecode). I must that apart from the switch to type objects, I don't fully understand the differences in numeric types in old Numeric and numarray and the motivation behind them. As far as I can see the emphasis with Numeric was to keep flexible to different hardware and increasing word sizes (i.e. to only guarantee minimum precision) and provide some reasonable "default" size for each type (e.g. `Float` being a double precision [3]). This approach is maybe somewhat similar to python core (floats and ints can have different sizes, depending on the underlying platform). In numarray the emphasis seems to have shifted on guaranteeing the actual size in memory (if in a few years time most calculations are done with 128bit precision than that's maybe not such a good idea, but I have no clue how likely this is to happen). Is this shift of emphasis is also responsible for the decision to have indexing operations always return arrays rather than scalars (including ones defined by numarray in cases where there is no plain-python equivalent)? Will all other functions (e.g. min) continue to return scalars? [BTW can anyone explain to me the difference between Int and Int32 (typecodes 'i' and 'l')?] Anyway, my apologies if I come across as too negative or if some the points are misinformed. I really think that the recent changes to numarray and this announcment are great step forward to a smooth transition of the whole community from Numeric to numarray which will play an important role in consolidating python's role in the scientific computing. night, alex Footnotes: [1] I think it might be beneficial, however, to add an explicitly note to the manual that alerts users to the fact that small slices can keep alive very large arrays, because I am under the impression that this is not immediately obvious to everyone and can cause puzzling problems. [2] I moaned on this list some months ago that doing linear algebra with Numeric array's was often cumbersome and inefficient (and the Matrix class that already comes with Numeric is rather limited). My (currently alpha) matrix class attempts to address these issues and also provides a much more flexible 'plugable' output formating (matlab-like, amongst others, which I guess many people will find much more readable; but the standard array-like formating is also available). [3] As an aside: maybe ``type="Float"`` in numarray should therefore *not* be equivalent to ``type=Float32`` but to ``type=Float64``, given that these strings seem to just be there for backwards compatibility? -- Alexander Schmolck Postgraduate Research Student Department of Computer Science University of Exeter A.Schmolck@gmx.net http://www.dcs.ex.ac.uk/people/aschmolc/
participants (3)
-
Alexander Schmolck
-
Paul F Dubois
-
Pete Shinners