[Python-Dev] Expose the array interface in Python 2.5?
Travis E. Oliphant
oliphant.travis at ieee.org
Fri Mar 17 11:40:38 CET 2006
Nick Coghlan wrote:
> Travis E. Oliphant wrote:
>> Would it be possible to add at least the C-struct array interface to the
>> Python arrayobject in time for Python 2.5?
> Do you mean simply adding an __array_shape__ attribute that consists of a
> tuple with the array length, and an __array_type__ attribute set to 'O'?
> Or trying to expose the array object's data?
I was thinking more the __array_struct__ (in particular the C-structure
that defines it).
> The former seems fairly pointless, and the latter difficult (since it has
> implications for moving the data store when the array gets resized).
Sure, it's the same problem as exposing through the buffer protocol.
Since, we already have that problem, why try to pretend we don't?
> I've spent a fair bit of time looking at this interface, and while I'm a big
> fan of the basic idea, I'm not convinced that it makes sense to
> include the interface in the core without *also* adopting a common convention
> for multi-dimensional fixed shape indexing (e.g. by introducing a simple
> dimensioned array type as something like array.dimarray).
True, such a thing would be great, but it could also be written in
Python fairly quickly building on top of the array and serve as a simple
My big quest is to get PIL, PyVox, WxPython, PyOpenGL, and so forth to
be able to use the same interface. Blessing the interface by including
it in the Python core would help. I'm also just wanting people in
py-dev to get the concept of an array interface on their radar, as
discussions of new bytes types emerges.
Sometimes, there is not enough cross-talk between numpy-discussions and
pydev. This is our fault, of course, but we're often swamped (I know I
am...), and it can take some effort for us "array" people to figure out
what's going on in the depths of Python sufficiently to comprehend some
of the discussions here.
> The fact that array.array is a mutable sequence rather than a fixed shape
> array means that it doesn't mesh particularly well with the ideas behind the
> array interface. numpy arrays can have their shape changed via reshape, but
> they impose the rule that the total number of elements can't change so that
> the allocated memory doesn't need to be moved - the standard library's array
> type has no such limitation.
This is not really a limitation of numpy arrays either. Check the
resize method... But, I understand your point that array.array's are
more-like lists. Of course, when they behave that way, their buffer
interface is presently broken. So, maybe the array.array is
sufficiently broken to not be worth "fixing", but what else should be done?
I'm kind of tired of this problem dragging on and on. The Numeric
header (essentially what the __array_struct__ exposes) is now basically
unchanged for over 10 years and yet it's direct support by Python is
still not their. The Python community has been very helpful over the
years, but we need more direct discussion with Python developers to help
things along. I'm grateful Nick has responded. If anyone else has any
interest in these ideas, please sound off.
> Aside from the obvious (the use of Ellipsis and permitting multiple
> dimensions), there are a number of ways in which the semantics of numpy array
> subscripts differ from normal sequence subcripts, and which of these should be
> part of the common multi-dimensional indexing conventions needs to be thrashed
> out in a PEP:
While these are interesting academic issues. The problem with most of
these comments is that you will get load voices of disapproval if any of
these conventions changes significantly from what has become standard
via Numeric's use over 10 years.
I think no one is up to the task of trying to re-concile Numeric
behavior with Python-dev opinions of what 'ought' to be, unless the
basic usage does not change too much.
> - numpy array slices are views that permit mutation of the original object
> (slicing a sequence creates a copy of the sliced section)
Not really open for discussion among Numeric Python users as it's been
debated for years always coming to the same (keep the current behavior)
> - assignment to slices is not allowed to change the shape of a numpy array
> (assigning to a slice of a normal sequence may change the total length)
People might be open to this idea, as it adds a new feature and doesn't
signficantly change other usages.
> - deletion of slices is not permitted by numpy arrays
> (deleting a slice of a sequence changes the total length)
Also something people might accept.
> - NewAxis is a novel use of subscript notation
True, but not something we can really change.
> - there are sophisticated rules to try to align numpy array shapes
You are speaking of broadcasting. These could of course be discussed,
but current behavior is "entrenched"
> - assignment of a sequence to a numpy array section is rather disconcerting,
> as the checks to determine what should and should not be repeated to fit
> into the available space are type based
I'm not sure what this means... Please elaborate.
> For something in the standard library, much of the complexity should be
> stripped out, with the clever bits of programmer convenience left for numpy to
> provide. However, decided which bits to remove and which to keep is a
> non-trivial task.
I agree. I suppose your itemization above was really to come to this
conclusion as well. But, I think a stripped-down array that doesn't
try to guess what to do with these interfaces is a good start. In other
words, I disagree that you need to implement multidimensional indexing
in order for Python to support the array interface. All you need is a
simple object that supports the buffer protocol and has the
__array_struct__ method and has a C-structure very similar to the
current NumPy array (which is very similar to the old Numeric
If such a thing were in Python, then NumPy could inherit from it (as
could other array-like objects), with the big advantage that there is at
least one common memory model for arrays. Others could still exist, of
course, but at least there would be a very useful common one.
> Given that even the bytes type has been deferred to 2.6 to allow further
> consideration of the appropriate API, my vote is to do the same for an
> array.dimarray type and allow more time to figure out the appropriate *Python*
I was afraid of that. But, unless people in pydev actually care to
discuss these matters, I fear that yet again nothing will be done. The
problem is that for most of us array users, it's only community outreach
and a desire to get people using Python talking the same array language
that makes us really care about these things. The NumPy library works
fine for what we really need it to do, and it's hard to get motivated to
convince people that haven't used an array-language like IDL or
MATLAB in the past to understand the reasons for NumPy's behavior.
The big difference with the bytes type, is that Numeric has 10 years of
history behind it. There is a lot of experience with an appropriate
array type. It's not like we just came up with this a few days ago :-)
As the bytes type is developed please keep in mind it's uses as the
memory for an N-dimensional array. Perhaps the bytes object could be a
default way (or built on a default way) to allocate memory. A simple
reference-counted memory object would certainly belay the problems of
the buffer interface that the array object currently has problems with.
In other words, the array object should not malloc it's own memory but
create a memory object which is nothing more than a reference-counted
pointer to memory. Surely this has been talked about. Is there a reason
it has not been implemented? It would not be that hard.
Even something like that would be a first step.
Thanks for the comments. I'm glad there is another voice here that
cares about the issues involved.
More information about the Python-Dev