Attached is my PEP for extending the buffer protocol to allow array data
to be shared.
PEP: <unassigned>
Title: Extending the buffer protocol to include the array interface
Version: $Revision: $
Last-Modified: $Date: $
Author: Travis Oliphant
Status: Draft
Type: Standards Track
Created: 28-Aug-2006
Python-Version: 2.6
Abstract
This PEP proposes extending the tp_as_buffer structure to include
function pointers that incorporate information about the intended
shape and data-format of the provided buffer. In essence this will
place something akin to the array interface directly into Python.
Rationale
Several extensions to Python utilize the buffer protocol to share
the location of a data-buffer that is really an N-dimensional
array. However, there is no standard way to exchange the
additional N-dimensional array information so that the data-buffer
is interpreted correctly. The NumPy project introduced an array
interface (http://numpy.scipy.org/array_interface.shtml) through a
set of attributes on the object itself. While this approach
works, it requires attribute lookups which can be expensive when
sharing many small arrays.
One of the key reasons that users often request to place something
like NumPy into the standard library is so that it can be used as
standard for other packages that deal with arrays. This PEP
provides a mechanism for extending the buffer protocol (which
already allows data sharing) to add the additional information
needed to understand the data. This should be of benefit to all
third-party modules that want to share memory through the buffer
protocol such as GUI toolkits, PIL, PyGame, CVXOPT, PyVoxel,
PyMedia, audio libraries, video libraries etc.
Proposal
Add a bf_getarrayinfo function pointer to the buffer protocol to
allow objects to share additional information about the returned
memory pointer. Add the TP_HAS_EXT_BUFFER flag to types that
define the extended buffer protocol.
Specification:
static int
bf_getarrayinfo (PyObject *obj, Py_intptr_t **shape,
Py_intptr_t **strides, PyObject **dataformat)
Inputs:
obj -- The Python object being questioned.
Outputs:
[function result] -- the number of dimensions (n)
*shape -- A C-array of 'n' integers indicating the
shape of the array. Can be NULL if n==0.
*strides -- A C-array of 'n' integers indicating
the number of bytes to jump to get to the next
element in each dimension. Can be NULL if the
array is C-contiguous (or n==0).
*dataformat -- A Python object describing the data-format
each element of the array should be
interpreted as.
Discussion Questions:
1) How is data-format information supposed to be shared? A companion
proposal suggests returning a data-format object which carries the
information about the buffer area.
2) Should the single function pointer call be extended into
multiple calls or should it's arguments be compressed into a structure
that is filled?
3) Should a C-API function(s) be created which wraps calls to this function
pointer much like is done now with the buffer protocol? What should
the interface of this function (or these functions) be.
4) Should a mask (for missing values) be shared as well?
Reference Implementation
Supplied when the PEP is accepted.
Copyright
This document is placed in the public domain.