Attached is my PEP for extending the buffer protocol to allow array data to be shared.
PEP: <unassigned> Title: Extending the buffer protocol to include the array interface Version: $Revision: $ Last-Modified: $Date: $ Author: Travis Oliphant email@example.com Status: Draft Type: Standards Track Created: 28-Aug-2006 Python-Version: 2.6
This PEP proposes extending the tp_as_buffer structure to include function pointers that incorporate information about the intended shape and data-format of the provided buffer. In essence this will place something akin to the array interface directly into Python.
Several extensions to Python utilize the buffer protocol to share the location of a data-buffer that is really an N-dimensional array. However, there is no standard way to exchange the additional N-dimensional array information so that the data-buffer is interpreted correctly. The NumPy project introduced an array interface (http://numpy.scipy.org/array_interface.shtml) through a set of attributes on the object itself. While this approach works, it requires attribute lookups which can be expensive when sharing many small arrays.
One of the key reasons that users often request to place something like NumPy into the standard library is so that it can be used as standard for other packages that deal with arrays. This PEP provides a mechanism for extending the buffer protocol (which already allows data sharing) to add the additional information needed to understand the data. This should be of benefit to all third-party modules that want to share memory through the buffer protocol such as GUI toolkits, PIL, PyGame, CVXOPT, PyVoxel, PyMedia, audio libraries, video libraries etc.
Add a bf_getarrayinfo function pointer to the buffer protocol to allow objects to share additional information about the returned memory pointer. Add the TP_HAS_EXT_BUFFER flag to types that define the extended buffer protocol.
bf_getarrayinfo (PyObject *obj, Py_intptr_t **shape, Py_intptr_t **strides, PyObject **dataformat)
Inputs: obj -- The Python object being questioned.
[function result] -- the number of dimensions (n)
*shape -- A C-array of 'n' integers indicating the shape of the array. Can be NULL if n==0.
*strides -- A C-array of 'n' integers indicating the number of bytes to jump to get to the next element in each dimension. Can be NULL if the array is C-contiguous (or n==0).
*dataformat -- A Python object describing the data-format each element of the array should be interpreted as.
1) How is data-format information supposed to be shared? A companion proposal suggests returning a data-format object which carries the information about the buffer area.
2) Should the single function pointer call be extended into multiple calls or should it's arguments be compressed into a structure that is filled?
3) Should a C-API function(s) be created which wraps calls to this function pointer much like is done now with the buffer protocol? What should the interface of this function (or these functions) be.
4) Should a mask (for missing values) be shared as well?
Supplied when the PEP is accepted.
This document is placed in the public domain.