[Python-Dev] Array Enhancements

Scott Gilbert xscottg@yahoo.com
Fri, 5 Apr 2002 12:30:29 -0800 (PST)

I noticed that in the newsgroup that someone suggested adding the new
boolean type to the array module.  I was pleased to see that an
appropriate patch adding this feature would be accepted.  I would find
this useful, and since I'm on the verge of writing (yet another) array
module, I'm curious if any of the following would be accepted

Translation: Since none of the existing array modules that I'm aware of
(array, Numeric, Numarray) meet all of my needs, I'd be happy to submit
a patch for arraymodule.c if I can make enough changes to meet all of
my needs.  Otherwise, I'll have to write an internal 'xarray' module
for my company to get stuff done with.

I doubt this warrants a PEP, so I humbly propose:

*** Adding a new typecode 't' to implement a bit array.  Implementation
would be an array of bytes, but it would be 1 bit per element.  't' is
for 'truth value' since 'b' for 'boolean' was already taken by 'b' for
'byte'.  Accepted values would be 0, 1, True, False.  Looking at the
arraymodule.c, this seems like the most work out of any of theses
suggestions because all of the other types are easily addressable. 
Slicing operations are going to be tricky to do quickly and correctly. 
It was already acknowledged by GvR that this would be desirable.

*** Adding pickle support.  This is a no brainer I think.  Currently we
have to call tostring() if we want to serialize the data.

*** Changing the 'l' and 'L' typecodes to use LONG_LONG.  There isn't a
consistent way to get an 8 byte integer out of the array module.  About
half of our machines at work are Alphas where long is 8 bytes, and the
rest are Sparcs and x86's where long is 4 bytes.

*** I'd really like it if the array module gave a "hard commitment" to
the sizes of the elements instead of just sayings "at least n bytes". 
None of the other array modules do this either.  I know Python has been
ported to a bazillion platforms, but what are the exceptions to 'char'
being 8 bits, 'short' being a 16 bits, 'int' being 32 bits, 'long long'
or __int64 being 64 bits, 'float' being 32 bits, and 'double' being 64
bits?  I know that an int is 16 bits on Win16, but does Python live
there any more?  Even so, there is a 32 bit int type on Win16 as well.

I guess changing the docs to give a "hard commitment" to this isn't
such a big deal to me personally, because the above are true for every
platform I think I'll need this for (alpha, x86, sparc, mips).

*** In the absence of fixing the 'l' and 'L' types, adding new
typecodes ('n' and 'N' perhaps) that do use LONG_LONG.  This seems more
backwards compatible, but all it really does is make the 'l' and 'L'
typecodes duplicate either 'i' or 'n' depending on the platform
specific sizeof(long).  In otherwords, if an 'n' typecode was added,
who would want to use the 'l' one?  I suppose someone who knew they
wanted a platform specific long.

*** I really need complex types. And more than the functionality
provided by Numeric/Numarray, I need complex integer types.  We
frequently read hardware that gives us complex 16 or 32 bit integers,
and there are times when we would use 32 or 64 bit fixed point complex
numbers.  Sometimes we scale our "soft decision" data so that it would
fit fine in a complex 8 bit integer.  This could be easily added in one
of two ways: either adding a 'z' prefix to the existing typecodes, or
by creating new typecodes like such:

   'u' - complex bytes (8 bit)
   'v' - complex shorts (16 bit)
   'w' - complex ints (32 bit)
   'x' - complex LONG_LONGs (64 bit)
   'y' - complex floats (32 bits)
   'z' - complex doubles (64 bits)

The downside to a 'z' prefix is that typecodes could now be 2
characters 'zi', 'zb', and that would be a bigger change to the
implementation.  It's also silly to have complex unsigned types (who
wants complex numbers that are only in one quadrant?).

The downside to adding 'u', 'v', 'w', 'x', 'y', 'z' is that they aren't
very mnemonic, and the namespace for typecodes is getting pretty big.

Also, I'm unsure how to get the elements in and out of the typecode for
'x' above (a 2-tuple of PyInt/PyLongs?).  Python's complex type is
sufficient to hold the others without losing precision.

*** The ability to construct an array object from an existing C
pointer.  We get our memory in all kinds of ways (valloc for page
aligned DMA transfers, shmem etc...), and it would be nice not to copy
in and copy out in some cases.

*** Adding an additional signature to the array constructor that
specifies a length instead of initial values.  

   a = array.array('d', [1, 2, 3])

would work as it currently does, but

   a = array.array('d', 30000)

would create an array of 30000 doubles.  My current hack to accomplish
this is to create a small array and use the Sequence operation * to
create an array of the size I really want:

   a = array.array('d', [0])*300000

Besides creating a (small) temporary array, this calls memcpy 300000
times.  Yuk.

*** In the absence of the last one, adding a new constructor:

   a = array.xarray('d', 30000)

would create an array of 30000 doubles.

*** If a signature for creating large arrays is accepted, an optional
keyword parameter to specify the value of each element:

   a = array.xarray('d', 30000, val=1.0)

The default val would be None indicating not to waste time initializing
the array.

*** Multi-dimensional arrays would be valuable too, but this might be
treading too much in the territory of the Numarray guys.  (I really
wish there was a completed one size fits all of my needs array module.)
 I would propose the following for multi-dimensional arrays:

   a = array.array('d', 20000, 20000)


   a = array.xarray('d', 20000, 20000)

Well if someone authoritative tells me that all of the above is a great
idea, I'll start working on a patch and scratch my plans to create a
"not in house" xarray module.

    -Scott Gilbert

