PEP: Title: Extending the buffer protocol to include the array interface Version: $Revision: $ Last-Modified: $Date: $ Author: Travis Oliphant Status: Draft Type: Standards Track Created: 28-Aug-2006 Python-Version: 2.6 Abstract This PEP proposes extending the tp_as_buffer structure to include function pointers that incorporate information about the intended shape and data-format of the provided buffer. In essence this will place an array interface directly into Python. Rationale Several extensions to Python utilize the buffer protocol to share the location of a data-buffer that is really an N-dimensional array. However, there is no standard way to exchange the additional N-dimensional array information so that the data-buffer is interpreted correctly. The NumPy project introduced an array interface (http://numpy.scipy.org/array_interface.shtml) through a set of attributes on the object itself. While this approach works, it requires attribute lookups which can be expensive when sharing many small arrays. One of the key reasons that users often request to place something like NumPy into the standard library is so that it can be used as standard for other packages that deal with arrays. This PEP provides a mechanism for extending the buffer protocol (which already allows data sharing) to add the additional information needed to understand the data. This should be of benefit to all third-party modules that want to share memory through the buffer protocol such as GUI toolkits, PIL, PyGame, CVXOPT, PyVoxel, PyMedia, audio libraries, video libraries etc. Proposal Add bf_getarrview and bf_relarrview function pointers to the buffer protocol to allow objects to share a view on a memory pointer including information about accessing it as an N-dimensional array. Add the TP_HAS_ARRAY_BUFFER flag to types that define this extended buffer protocol. Also a few additionsl C-API calls should perhaps be added to Python to facilitate creating new PyArrViewObjects. Specification: static PyObject* bf_getarrayview (PyObject *obj) This function must return a new reference to a PyArrViewObject which contains the details of the array information exposed by the object. If failure occurs, then NULL is returned and an exception set. static int bf_relarrayview(PyObject *obj) If not NULL then this will be called when the object returned by bf_getarrview is destroyed so that the underlying object can be aware when acquired "views" are released. The object that defines bf_getarrview should not re-allocate memory (re-size itself) while views are extant. A 0 is returned on success and a -1 and an error condition set on failure. The PyArrayViewObject has the structure typedef struct { PyObject_HEAD void *data; /* pointer to the beginning of data */ int nd; /* the number of dimensions */ Py_ssize_t *shape; /* c-array of size nd giving shape */ Py_ssize_t *strides; /* SEE BELOW */ PyObject *base; /* the object this is a "view" of */ PyObject *format; /* SEE BELOW */ int flags; /* SEE BELOW */ } PyArrayViewObject; strides -- a c-array of size nd providing the striding information which is the number of bytes to skip to get to the next element in that dimension. format -- a Python data-format object (PyDataFormatObject) which contains information about how each item in the array should be interpreted. flags -- an integer of flags. PYARR_WRITEABLE is the only flag that must be set appropriately by types. Other flags: PYARR_ALIGNED, PYARR_C_CONTIGUOUS, PYARR_F_CONTIGUOUS, and PYARR_NOTSWAPPED can all be determined from the rest of the PyArrayViewObject using the UpdateFlags C-API. The PyDataFormatObject has the structure typedef struct { PyObject_HEAD PySimpleformat primitive; /* basic primitive type */ int flags; /* byte-order, isaligned */ int itemsize; /* SEE BELOW */ int alignment; /* SEE BELOW */ PyObject *extended; /* SEE BELOW */ } PyDataFormatObject; enum Pysimpleformat {PY_BIT='1', PY_BOOL='?', PY_BYTE='b', PY_SHORT='h', PY_INT='i', PY_LONG='l', PY_LONGLONG='q', PY_UBYTE='B', PY_USHORT='H', PY_UINT='I', PY_ULONG='L', PY_ULONGLONG='Q', PY_FLOAT='f', PY_DOUBLE='d', PY_LONGDOUBLE='g', PY_CFLOAT='F', PY_CDOUBLE='D', PY_CLONGDOUBLE='G', PY_OBJECT='O', PY_CHAR='c', PY_UCS2='u', PY_UCS4='w', PY_FUNCPTR='X', PY_VOIDPTR='V'}; Each of these simple formats has a special character code which can be used to identify this primitive in a nested python list. flags -- flags for the data-format object. Specified masks are PY_NATIVEORDER PY_BIGENDIAN PY_LITTLEENDIAN PY_IGNORE itemsize -- the total size represented by this data-format in bytes unless the primitive is PY_BIT in which case it is the size in bits. For data-formats that are simple 1-d arrays of the underlying primitive, this total size can represent more than one primitive (with extended still NULL). alignment -- For the primitive types this is offsetof(struct {char c; type v;},v) extended -- NULL if this is a primitive data-type or no additional information is available. If primitive is PY_FUNCPTR, then this can be a tuple with >=1 element: (args, {dim0, dim1, dim2, ...}). args -- A list (of at least length 2) of data-format objects specifying the input argument formats with the last argument specifying the output argument data-format (use None for void inputs and/or outputs). For other primitives, this can be a tuple with >=2 elements: (names, fields, {dim0, dim1, dim2, ...}) Use None for both names and fields if they should be ignored. names -- An ordered list of string or unicode objects giving the names of the fields for a structure data-format. fields -- a Python dictionary with ordered-keys given by the list in names. Each entry in the dictionary is a 3-tuple containing (data-format-object, offset, meta-information) where meta-information is Py_None if there is no meta-information. Offset is given in bytes from the start of the record or in bits if PY_BIT is the primitive. Any additional entries in the extended tuple (dim0, dim1, etc.) are interpreted as integers which specify that this data-format is an array of the given shape of the fundamental data-format specified by the remainder of the DataFormat Object. The dimensions are specified so that the last-index is always assumed to vary the fastest (C-order). The constructor of a PyArrViewObject allocates the memory for shape and strides and the destructor frees that memory. The constructor of a PyDataFormatObject allocates the objects it needs for fields, names, and shape. C-API void PyArrayView_UpdateFlags(PyObject *view, int flags) /* update the flags on the array view object provided */ PyDataFormatObject *Py_NewSimpleFormat(Pysimpleformat primitive) /* return a new primitive data-format object */ PyDataFormatObject *Py_DataFormatFromCType(PyObject *ctype) /* return a new data-format object from a ctype */ int Py_GetPrimitiveSize(Pysimpleformat primitive) /* return the size (in bytes) of the provided primitive */ PyDataFormatObject *Py_AlignDataFormat(PyObject *format) /* take a data-format object and construct an aligned data-format object where all fields are aligned on appropriate boundaries for the compiler */ Discussion The information provided in the array view object is patterned after the way a multi-dimensional array is defined in NumPy -- including the data-format object which allows a variety of descriptions of memory depending on the need. Reference Implementation Supplied when the PEP is accepted. Copyright This document is placed in the public domain.