Re: [Python-Dev] Expose the array interface in Python 2.5?
On 3/17/06, Thomas Heller
Accessing Python arrays (Numeric arrays, Numeric array, or Numpy array) as ctypes arrays, and vice versa, without copying any memory, would be a good thing.
This does bring up a point. I was thinking that a really bare-bones nd-array type would be very useful by itself as a way to exchange data. However, if ctypes were to return nd-arrays from C function calls, then we would want at least a rudimentary way of directly accessing the data. I'd sure like to see indexing and slicing, at at least. Also if ctypes were to allow nd-arrays as a way to pass data into a C function, then you'd need a build-in python way to create the data in the first place. If that's too much for now, I'd still like to have the basic data structure defined in the standard library (or built-in) as soon as possible. Greg Ewing wrote:
It might be all right for writers of new extensions, but there are existing modules (PIL, ctypes, etc.) that already have their own way of storing data, and it seems to me it would be easier for the maintainers of those modules to add a new interface to the existing data than to rearrange their internal structure to use this new C-object.
Can we have both? A defined interface, that existing code can be adapted to provide, and a new C-Object, that future code can just use. If the goal is to have as many extension types as possible use the same base object, the sooner a standard object is provided the better. There are those of us in the scientific computing community that would love to just have numpy be part of the standard library. It really is big, and maybe too special purpose for that, but at least having the core array in there would be great. For years, I've been dealing with modules like wxPython that don't understand numpy arrays. As a result, passing in a list of points to be drawn is actually faster than passing in a numpy array of points, even though the numpy array of points has the data in the exact binary representation that wxWidgets expects. The problem is that the wrapper code doesn't know that, because Robin (quite reasonably) didn't want to have wxPython depend on Numeric. While a standard interface could support this, it would be great if wxPython could also do things like pass an image buffer back to Python efficiently. Another point is that n-dimensional arrays really are very useful for all sorts of stuff that have nothing to do with high-performance Numeric computing. I find myself using numpy for almost every little script I write, even though most of it is not performance bounded at all. I suspect that if we get a n-dimensional array type into Python, one that allows n-d slicing, it will see a LOT of use by people who now think they have no use for numpy. My guess is that a 2-dimensional object array would get the most use, but why not support n-d while you're at it? Having an easy and natural way to store and manipulate a "table" of objects, for instance, would be handy for many, many users. I'm still a tiny bit confused about the proposed individual pieces involved, but I'd like to see, in order of priority (and difficulty): 1) A standard n-d array interface (numpy already defines this, but outside of the numpy community, who knows about it?) 2) A standard, n-d array C-Object, inheritable for use by various extension packages. 3) A standard n-d array object, with a python interface for indexing and slicing for access and assignment (modeled after, or better yet taken directly from, numpy). 4) A standard n-d array object that supports array-wise arithmetic and functions (ufuncs, in numpy parlance). There is no reason it has to have anything to do with the current array module. For instance, having the arrays be of fixed size is just fine. It's really not that big a deal to make a new one when you need to change the size, after all, we live with that for immutable objects, like strings. just my $0.02 -Chris
Chris Barker wrote:
I'm still a tiny bit confused about the proposed individual pieces involved, but I'd like to see, in order of priority (and difficulty):
1) A standard n-d array interface (numpy already defines this, but outside of the numpy community, who knows about it?)
This is pretty much a matter of taking Travis's array interface and checking it in as a PEP. Checking it in is the easy part, but I'm not clear on what actually constitutes the array interface. The array interface is apparently here: http://numeric.scipy.org/array_interface.html but it only talks about exposing this info to Python code - it doesn't discuss how to expose or use it at the C level (unless the intent is that it should be accessed via the PyObject abstract API, in which case a PEP would need to say that explicitly). However, the top of that page references the genarray PEP: http://svn.scipy.org/svn/PEP/PEP_genarray.txt Here it gets even *more* confusing, because the dimarray objects in that PEP don't appear to expose the array interface described by the page above. If this PEP is out of date, then the reference needs to be removed until it is brought up to speed (a disclaimer on the draft PEP probably wouldn't hurt either). A useful PEP for a C-level array interface would need to cover the following three things that an extension author would need to know in order to either produce or consume generic arrays: 1. The C-level protocol for exposing and retrieving the array interface (potentially using the PyObject API to access Python-level attributes) 2. How to use the array interface to access an extension type's data 3. How to use the array interface to modify an extension type's data This is doable in whatever time frame we like, since it is documenting a convention, rather than an implementation, and so isn't coupled directly to the release cycle. However, if it's available soon (before the second alpha?), it may be possible to update arrayobject and ctypes to expose this interface for Python 2.5.
2) A standard, n-d array C-Object, inheritable for use by various extension packages.
3) A standard n-d array object, with a python interface for indexing and slicing for access and assignment (modeled after, or better yet taken directly from, numpy).
For the sake of test writers' sanity, these would probably need to be done together so that at least some of the tests could be written in Python. Due to the conflicting slice semantics between the standard library (slices are copies) and numpy (slices are mutable views), I'd actually suggest that the Python interface for this simple object *shouldn't* support slicing (at least, not in its first incarnation). Regardless, I can't see either of these steps happening before Python 2.6. If we try to do step 2 without doing step 3, then we're stuck either writing all the test code in C, or else shipping an untested component. Neither seems like a good idea.
4) A standard n-d array object that supports array-wise arithmetic and functions (ufuncs, in numpy parlance).
Definitely not before 2.6 :)
There is no reason it has to have anything to do with the current array module. For instance, having the arrays be of fixed size is just fine. It's really not that big a deal to make a new one when you need to change the size, after all, we live with that for immutable objects, like strings.
The array module is the easiest place to put the code - then you would have "array.array" as a 1 dimensional resizeable array, and "array.dimarray" as a multi-dimensional array. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org
Chris Barker wrote:
Can we have both? A defined interface, that existing code can be adapted to provide, and a new C-Object, that future code can just use. If the goal is to have as many extension types as possible use the same base object, the sooner a standard object is provided the better.
Having many extension types provide the same *interface* is what I think the main goal should be, not to have them use the same object. So getting the interface defined should be the first priority.
I'd sure like to see indexing and slicing, at at least.
The interface itself doesn't need to provide indexing and slicing -- these could be provided by a view object that used the array interface of the underlying object. This would also fit in nicely with the "views" philosophy that seems to be shaping up for Py3k.
Another point is that n-dimensional arrays really are very useful for all sorts of stuff that have nothing to do with high-performance Numeric computing.
I'm all in favour of including such an object, as long as we keep in mind that this is an orthogonal issue to having an array interface. The discussion still seems to be a bit muddled on this point. Greg
participants (3)
-
Chris Barker
-
Greg Ewing
-
Nick Coghlan