[Python-Dev] Array Enhancements

Scott Gilbert xscottg@yahoo.com
Mon, 8 Apr 2002 00:52:43 -0700 (PDT)

Thanks for the various replies.  As suggested by a few, I'll take this
to the Numarray folks and see where it goes from there.

Just to respond to a few of the points though...  I've put all my
responses in one message to wrap things up.

Tim Peters wrote:
> Sounds like a PEP to me.

My initial response to reading this was a loud "ugh" as I envisioned
red tape swarming around for what I would consider to be a pretty
simple patch.  I mean, I just wanted to hack in some new typecodes...

After thinking about things for a while though, I've come to the
conclusion that the builtin Python array module does need a real
reworking.  Even though one ships with the standard baseline, it's
getting reinvented again and again.

I hope the Numarray guys are doing a bang up job with their NDArray
type (I've looked at it briefly, but I don't really understand it
yet...).  I suspect that most of the ufuncs and other stuff those guys
are doing are too special purpose to be part of the standard Python
baseline, but I would very much like to see a single usable array type
become the standard.  I'd be willing to do PEP grunt work for that.

Tim Peters also wrote:
> > ...
> > *** I really need complex types. And more than the functionality
> > provided by Numeric/Numarray, I need complex integer types.
> This will meet resistance, as it's a pile of code of no conceivable 
> use to the vast majority of Python users.  That is, "code bloat".  
> Instead the array type should be subclassable, and extreme 
> special-purpose hair like "complex integers" should be supplied by 
> extension modules.

It's not that much bloat.  It would be a setitem and getitem pair for
each new type.

I'll give you that most people don't need "fixed point complex arrays".

Guido van Rossum wrote:
> You'll have to consider: is it important to be able to read pickled
> arrays on previous Python releases, or it that not a requirement?  If
> it's not, you should probably add a new pickle code for pickled
> arrays, and do an implementation that writes;

Nope, we ship the version of Python we want them to use with our

Did you guys really make it possible to unpickle a Unicode string in
versions of Python that were pre Unicode?

I would think new features should only work in new versions...

Guido also wrote:
> Ehm, 'u' is already taken (Unicode).

That must have snuck in there sometime after 2.2 I guess.

Guido also wrote:
> > *** The ability to construct an array object from an existing C
> > pointer.  We get our memory in all kinds of ways (valloc for page
> > aligned DMA transfers, shmem etc...), and it would be nice not to 
> > copy in and copy out in some cases.
> But then you get into ownership issues.  Who owns that memory?  Who
> can free it?  What if someone calls a method on the array that
> requires the memory to be resized?
> But it's a useful thing to be able to do, I agree, and it shouldn't
> too hard to add a flag that says "I don't own this memory" -- which
> would mean that the buffer can't be resized at all.

I pictured this working like CObjects do where you pass in a destructor
for when the reference count goes to zero.  Possibly also passing in a
realloc function.  If the realloc function is null, then an exception
is raised when someone tries to resize the array.

This means there would need to be a C visible API for building array
objects around special types of memory though.

Guido also wrote:
> Since arrays are all about compromises that trade flexibility for
> speed and memory footprint, you can't have a one size fits all. :-)

Bahh.  I don't think getting a good general purpose Python object that
represents arbitrary C arrays is all that impossible.  C arrays just
don't do that much.

Besides I didn't say "one size fits all", I said "one size fits all my
needs".  That "my" is important (at least to me :-)

Guido also wrote:
> > Well if someone authoritative tells me that all of the above is a
> > great idea, I'll start working on a patch and scratch my plans to
> > create a "not in house" xarray module.
> It all depends on the quality of the patch.  By the time you're done
> you may have completely rewritten the array module, and then the
> question is, wouldn't your own xarray module have been quicker to
> implement, because it doesn't need to preserve backwards
> compatibility?

Yup, I think I would be done with my xarray module by now if I had
written it instead of taking this route.  It would also have the
disadvantage that it doesn't play nice with anyone else.

I now think the best bet is to replace the array module with something
flexible enough to:

  1) do what it currently does
  2) do what the Numarray guys need
  3) do what I need

Guido also wrote:
> An alternative might be a separate bit-array implementation: it seems
> that the bit-array won't share much code with the regular array (of
> any flavor), so why not make it a separate type?

Yup.  It would be nice if a bitarray was actually the same type, but
having code like:

   if (o->is_bitarray) {
      /* do something */
   } else {
      /* do every other byte addressable type */

is a little ugly.

David Ascher wrote:
> > I just realized that multi-dimensional __getitem__ shouldn't be a
> > big deal.  The question is, given the above declaration, what a[0] 
> > should return: the same as a[0, 0] or a copy of a[0, 0:20000] or
> > a reference to a[0, 0:20000].
> Or a ValueError?  In the face of ambiguity, refuse the temptation to
> guess.

Yup.  I think there should be a base array type that raises a
ValueError or similar, and derived array types can implement slice
references or slice copies as need be.

David also wrote:
> Why does submitting a patch to arraymodule seem an easier path than
> modifying numarray or numpy to support what's needed?  I believe that
> the goals of numarray aren't that different from what Scott is trying
> to do (memory management APIs, etc.).

Well, part of my preference for modifying arraymodule.c instead of
Numarray is that I very quickly understood what's going on in
arraymodule.c, and a patch is pretty obvious.  Looking at Numarray, I
just don't get it yet.  Please take this as a shortcoming in my
abilities.  Numarray does appear to be the heir-apparent though, so
I'll give it a better look.

I also assumed that the Numarray folks would play nice with the
standard array module.  So if I could get what I wanted out of array,
then I could leverage Numarray when the opportunity arose.

David also wrote:
> I'd like to see fewer multi-dimensional array objects, not more...

I agree completely.  In fact, I'd like to see one official one
distributed with the baseline.

Perry Greenfield wrote:
> [ a whole bunch of interesting things ]

I think I'll try to bring those up on the Numarray list.

    -Scott Gilbert

Do You Yahoo!?
Yahoo! Tax Center - online filing with TurboTax