[Python-Dev] Array Enhancements

Guido van Rossum guido@python.org
Mon, 08 Apr 2002 10:28:40 -0400


Scott Gilbert:
> I hope the Numarray guys are doing a bang up job with their NDArray
> type (I've looked at it briefly, but I don't really understand it
> yet...).  I suspect that most of the ufuncs and other stuff those guys
> are doing are too special purpose to be part of the standard Python
> baseline, but I would very much like to see a single usable array type
> become the standard.  I'd be willing to do PEP grunt work for that.

The Numarray fols already have a PEP.  I'd only be willing to consider
another array module if it was a clean subset or baset class of the
full Numarray functionality; I think the Numarray folks are thinking
about that already.  (This would mean that the Numarray base type
would be part of core Python but that the Numarray folks would
maintain their own ufuncs library on top of it.)

> It's not that much bloat.  It would be a setitem and getitem pair for
> each new type.

If Tim says it's code bloat, it's code bloat.  Don't argue. :-)

> Did you guys really make it possible to unpickle a Unicode string in
> versions of Python that were pre Unicode?

No, but other than that *addition* the pickle format has been pretty
stable.  And I just checked in code that ensures that bools pickled by
2.3 will be unpicklable as ints in 2.2 and before (without even
changing the 2.2 unpickler!)

> I would think new features should only work in new versions...

I understand that that's what you think, and that's why we have to
keep explaining to you that for pickling you need to be more
conservative.  Pickling is often used as a client-server mechanism,
and you don't always (want to) have control over the Python version
used on the other end.

> That must have snuck in there sometime after 2.2 I guess.

Yes, the array module has been overhauled significantly.  Please
always use current CVS.

> > But it's a useful thing to be able to do, I agree, and it
> > shouldn't be too hard to add a flag that says "I don't own this
> > memory" -- which would mean that the buffer can't be resized at
> > all.
> 
> I pictured this working like CObjects do where you pass in a destructor
> for when the reference count goes to zero.  Possibly also passing in a
> realloc function.  If the realloc function is null, then an exception
> is raised when someone tries to resize the array.

That'll work too.  Tim already pointed out the other problem -- the C
code that *does* own the memory must know if you still have a
reference to it.

> This means there would need to be a C visible API for building array
> objects around special types of memory though.

That's OK.  Just design one.

> Guido also wrote:
> > Since arrays are all about compromises that trade flexibility for
> > speed and memory footprint, you can't have a one size fits all. :-)
> 
> Bahh.  I don't think getting a good general purpose Python object that
> represents arbitrary C arrays is all that impossible.  C arrays just
> don't do that much.

I'll bet you 10 dollars that your patch will be larger than the code
it patches, if you really implement everything on your wish list.

> Besides I didn't say "one size fits all", I said "one size fits all my
> needs".  That "my" is important (at least to me :-)

But you did offer to make it a standard Python module, so you'll have
to deal with other people's wishes too.  It seems to me that you're
pretty green at this... :-(

> Guido also wrote:
> > > Well if someone authoritative tells me that all of the above is a
> > > great idea, I'll start working on a patch and scratch my plans to
> > > create a "not in house" xarray module.
[I didn't write that, you did; I wrote this:]
> > It all depends on the quality of the patch.  By the time you're done
> > you may have completely rewritten the array module, and then the
> > question is, wouldn't your own xarray module have been quicker to
> > implement, because it doesn't need to preserve backwards
> > compatibility?
> 
> Yup, I think I would be done with my xarray module by now if I had
> written it instead of taking this route.  It would also have the
> disadvantage that it doesn't play nice with anyone else.

But you said you only cared about your own needs? :-)

> I now think the best bet is to replace the array module with something
> flexible enough to:
> 
>   1) do what it currently does

And that's where backwards compatibility is going to kill most of your
innovation.  Consider yourself lucky that the array module currently
doesn't define any C APIs, so at least there aren't any C APIs that
need to keep working...  This is one of the Numarray nightmares.

>   2) do what the Numarray guys need
>   3) do what I need

Sounds like working from some common base of Numarray makes more sense.

> Guido also wrote:
> > An alternative might be a separate bit-array implementation: it seems
> > that the bit-array won't share much code with the regular array (of
> > any flavor), so why not make it a separate type?
> 
> Yup.  It would be nice if a bitarray was actually the same type,

Why?  Have you really thought this through?  It can still have the
same constructor function, but why would being the same type make any
difference?

> but having code like:
> 
>    if (o->is_bitarray) {
>       /* do something */
>    } else {
>       /* do every other byte addressable type */
>    }
> 
> is a little ugly.

In fact, it's why OO programming was invented.  No kidding!

> Well, part of my preference for modifying arraymodule.c instead of
> Numarray is that I very quickly understood what's going on in
> arraymodule.c, and a patch is pretty obvious.  Looking at Numarray, I
> just don't get it yet.  Please take this as a shortcoming in my
> abilities.  Numarray does appear to be the heir-apparent though, so
> I'll give it a better look.

Yes, please do.  (And if you conclude that Numarray stinks, please
write down a careful review and discuss it both with the Numarray
authors and with us -- we don't want our heir to be dead in the
throne. :-)

> I also assumed that the Numarray folks would play nice with the
> standard array module.

I don't think that's one of their requirements, and I don't see why it
should be.

> David also wrote:
> > 
> > I'd like to see fewer multi-dimensional array objects, not more...
> 
> I agree completely.  In fact, I'd like to see one official one
> distributed with the baseline.

Eventually, this will be Numarray, most likely.

--Guido van Rossum (home page: http://www.python.org/~guido/)