[Python-Dev] Array Enhancements

Perry Greenfield perry@stsci.edu
Fri, 5 Apr 2002 17:26:15 -0500


> -----Original Message-----
> From: eric [mailto:eric@enthought.com]
> Sent: Friday, April 05, 2002 3:32 PM
> To: python-dev@python.org
> Cc: Scott Gilbert; Perry Greenfield; travis oliphant
> Subject: Re: [Python-Dev] Array Enhancements
>
>
> Hey Scott,
>
> You should consider taking this proposal to the Numeric discussion list.
>
>     numpy-discussion@lists.sourceforge.net
>
> Numarray is still in the formative stages (though already pretty
> far along), and
> some of your suggestions might make it into the specification.
> Multi-dimensional arrays for numeric calculations will
> (hopefully) make it into
> the core at some point.  Certainly numarray is a candidate for this.
>
> > Translation: Since none of the existing array modules that I'm aware of
> > (array, Numeric, Numarray) meet all of my needs, I'd be happy to submit
> > a patch for arraymodule.c if I can make enough changes to meet all of
> > my needs.  Otherwise, I'll have to write an internal 'xarray' module
> > for my company to get stuff done with.
>
> If your needs are general purpose enough to fit in the core, they should
> definitely be discussed for additions to numarray.
>
> >
> > I doubt this warrants a PEP, so I humbly propose:
> >
> >
> > *** Adding a new typecode 't' to implement a bit array.  Implementation
> > would be an array of bytes, but it would be 1 bit per element.  't' is
> > for 'truth value' since 'b' for 'boolean' was already taken by 'b' for
> > 'byte'.  Accepted values would be 0, 1, True, False.  Looking at the
> > arraymodule.c, this seems like the most work out of any of theses
> > suggestions because all of the other types are easily addressable.
> > Slicing operations are going to be tricky to do quickly and correctly.
> > It was already acknowledged by GvR that this would be desirable.
>
> A bit type for numarray has been discussed at least briefly.  The
> implementation
> for a 1 bit per element array is tricky and hasn't been to high
> on anyone's
> implementation list.  If you were to propose this and offer to
> implement it, the
> Space Telescope guys might accept it.  It definitely needs
> discussion though.
>
Bit array have definitely been discussed. We aren't opposed to them
at all (and could use them ourselves). On the other hand, they aren't
high enough priority for us to implement them within the next 6 months
(or even year). The problem with bit arrays is that for the most part,
they would not use much of the existing numarray code, so it is a moderate
amount of work to add if you want it to be speed and memory efficient.
But if someone else wants to do the implementation, we would welcome
it. (But we don't use typecodes so 't' is right out :-)

> >
> > *** Adding pickle support.  This is a no brainer I think.  Currently we
> > have to call tostring() if we want to serialize the data.
> >
>
> Numeric and (I think) numarray are picklable already.  If
> numarray isn't now, it
> will be I'm sure.
>
We just got a question about this. numarray doesn't currently support
pickling. But it will. We have begun to think about how to best
implement it.

> >
> > *** Changing the 'l' and 'L' typecodes to use LONG_LONG.  There isn't a
> > consistent way to get an 8 byte integer out of the array module.  About
> > half of our machines at work are Alphas where long is 8 bytes, and the
> > rest are Sparcs and x86's where long is 4 bytes.
>
> Haven't used LONG_LONG before, but it sounds like this warrants
> discussion.
>
We are proponents of using types to represent the same sized
numeric types on all machines. numarray does not yet support Int64.
Ultimately it will, but we need to give some thought to how we deal
with nonportable (i.e., not available on all platforms) numeric types.
This does warrant discussion.

> >
> > *** I'd really like it if the array module gave a "hard commitment" to
> > the sizes of the elements instead of just sayings "at least n bytes".
> > None of the other array modules do this either.  I know Python has been
> > ported to a bazillion platforms, but what are the exceptions to 'char'
> > being 8 bits, 'short' being a 16 bits, 'int' being 32 bits, 'long long'
> > or __int64 being 64 bits, 'float' being 32 bits, and 'double' being 64
> > bits?  I know that an int is 16 bits on Win16, but does Python live
> > there any more?  Even so, there is a 32 bit int type on Win16 as well.
> >

Well, as mentioned, we believe the numarray types should be of definite
sizes and we've implemented this so they are. We can't do anything about
Python scalars however, nor should we. We have no plans for dealing with
Win16. It doesn't exist as far as we are concerned.

> > I guess changing the docs to give a "hard commitment" to this isn't
> > such a big deal to me personally, because the above are true for every
> > platform I think I'll need this for (alpha, x86, sparc, mips).
> >
>
> I don't see this one as that big a deal.  As you say, most modern
> platforms
> treat them the same way.
>
> > *** In the absence of fixing the 'l' and 'L' types, adding new
> > typecodes ('n' and 'N' perhaps) that do use LONG_LONG.  This seems more
> > backwards compatible, but all it really does is make the 'l' and 'L'
> > typecodes duplicate either 'i' or 'n' depending on the platform
> > specific sizeof(long).  In otherwords, if an 'n' typecode was added,
> > who would want to use the 'l' one?  I suppose someone who knew they
> > wanted a platform specific long.
> >
> >
> > *** I really need complex types. And more than the functionality
> > provided by Numeric/Numarray, I need complex integer types.  We
> > frequently read hardware that gives us complex 16 or 32 bit integers,
> > and there are times when we would use 32 or 64 bit fixed point complex
> > numbers.  Sometimes we scale our "soft decision" data so that it would
> > fit fine in a complex 8 bit integer.  This could be easily added in one
> > of two ways: either adding a 'z' prefix to the existing typecodes, or
> > by creating new typecodes like such:
>
> I hadn't thought about needing integer complex numbers.  It seems
> possible to
> add this for numarray.  Numeric uses uppercase letters to
> represent complex
> versions of numbers ('f' for floats and 'F' for complex floats).  The same
> convention could be used for complex integer types with the
> exception of Int8
> whose character is '1'.
>
Complex ints? We sure don't need them and I suspect that is a fairly
niche type. I would be against making them part of the core numarray.
It is possible to add such types to numarray "on the fly" or create
subclasses that support them. But there is enough bloat with existing
(and potential future common types--e.g., Int64, Int128, Float128...)
that I would avoid them as part of the base numarray.

> >
> >    'u' - complex bytes (8 bit)
> >    'v' - complex shorts (16 bit)
> >    'w' - complex ints (32 bit)
> >    'x' - complex LONG_LONGs (64 bit)
> >    'y' - complex floats (32 bits)
> >    'z' - complex doubles (64 bits)
> >
> > The downside to a 'z' prefix is that typecodes could now be 2
> > characters 'zi', 'zb', and that would be a bigger change to the
> > implementation.  It's also silly to have complex unsigned types (who
> > wants complex numbers that are only in one quadrant?).
> >
> > The downside to adding 'u', 'v', 'w', 'x', 'y', 'z' is that they aren't
> > very mnemonic, and the namespace for typecodes is getting pretty big.
> >

As mentioned, numarray doesn't directly use typecodes (though for backward
compatibility it does recognize existing ones). We would like to steer
users away from them (they are confusing enough without the proposed
additions above!).

Since so few of the ufuncs appear to make much sense for integer complex
types (transendental functions, most comparison or bit functions...),
this is almost certainly better implemented as a subclass of NDArray
in numarray (almost how complex is currently implemented though we plan
to replace that with a C implementation).

> > Also, I'm unsure how to get the elements in and out of the typecode for
> > 'x' above (a 2-tuple of PyInt/PyLongs?).  Python's complex type is
> > sufficient to hold the others without losing precision.
> >
> >
> > *** The ability to construct an array object from an existing C
> > pointer.  We get our memory in all kinds of ways (valloc for page
> > aligned DMA transfers, shmem etc...), and it would be nice not to copy
> > in and copy out in some cases.
> >
In one sense this is simple. But the real issue is how will Python manage
memory for these C pointers? If something else deallocates the memory,
how do you know that a numeric object or numarray object isn't using it?
Without specifying how memory managment is to be done, it isn't possible
to use "outside" pointers safely. But maybe I misunderstand.

> >
> >
> > *** Adding an additional signature to the array constructor that
> > specifies a length instead of initial values.
> >
> >    a = array.array('d', [1, 2, 3])
> >
> > would work as it currently does, but
> >
> >    a = array.array('d', 30000)
> >
> > would create an array of 30000 doubles.  My current hack to accomplish
> > this is to create a small array and use the Sequence operation * to
> > create an array of the size I really want:
> >
> >    a = array.array('d', [0])*300000
> >
> > Besides creating a (small) temporary array, this calls memcpy 300000
> > times.  Yuk.
> >
>
> In Numeric,
>
>     zeros(30000,typecode='d') or
>     ones(30000,typecode='f')
>
> work for doing this.

Yup.

> >
> >
> > *** In the absence of the last one, adding a new constructor:
> >
> >    a = array.xarray('d', 30000)
> >
> > would create an array of 30000 doubles.
> >
> >
> >
> > *** If a signature for creating large arrays is accepted, an optional
> > keyword parameter to specify the value of each element:
> >
> >    a = array.xarray('d', 30000, val=1.0)
> >
> > The default val would be None indicating not to waste time initializing
> > the array.
> >
>   Initializing to a specific value can be done with:
>
>     ones(30000,typecode='f') * value
>
> This does require a multiplication, so it isn't as fast as your proposal.
> Adding your proposed function to Numeric is minimal effort -- it
> may even be
> lurking there now, though I have never seen or used it.
>
I really doubt that the performance penalty is worth bothering about.
If this is one of your biggest performance problems, I'd like to trade :-)
But it is trivial to add (if it isn't already in numarray).

> >
> >
> > *** Multi-dimensional arrays would be valuable too, but this might be
> > treading too much in the territory of the Numarray guys.  (I really
> > wish there was a completed one size fits all of my needs array module.)
> >  I would propose the following for multi-dimensional arrays:
> >
> >    a = array.array('d', 20000, 20000)
> >
> > or:
> >
> >    a = array.xarray('d', 20000, 20000)
> >
>
or

a = zeros((20000,20000),typecode='d') # for Numeric

> Any multi-dimensional array really should come from the
> conventions used by
> Numeric.  It sounds like the missing features from Numarray that
> your interested
> in are: bit arrays, support for LONG_LONG, and support for
> complex integers.
> The other features are already in Numeric/numarray -- perhaps
> with syntax.  Is
> this the full picture?  All seem reasonably general and worth
> discussing for
> numarray.
>
> By the way, numarray is written so that it can be sub-classed.
> This means you
> could add the features you want in a derived class if the
> features are accepted
> into the standard.
>
> regards,
> eric
>
>
Eric summarizes it correctly.

numarraying'ly yours, Perry