Mailman 3 Latest Array-Interface PEP - NumPy-Discussion

Jan. 4, 2007

      I'm attaching my latest extended buffer-protocol PEP that is trying to 
get the array interface into Python.  Basically, it is a translation of 
the numpy header files into something as simple as possible that can 
still be used to describe a complicated block of memory to another user.

My purpose is to get feedback and criticisms from this community before 
display before the larger Python community.

-Travis

PEP: <unassigned>
Title: Extending the buffer protocol to include the array interface
Version: $Revision: $
Last-Modified: $Date:  $
Author: Travis Oliphant <oliphant@ee.byu.edu>
Status: Draft
Type: Standards Track
Created: 28-Aug-2006
Python-Version: 2.6

Abstract

    This PEP proposes extending the tp_as_buffer structure to include 
    function pointers that incorporate information about the intended
    shape and data-format of the provided buffer.  In essence this will
    place an array interface directly into Python. 

Rationale

    Several extensions to Python utilize the buffer protocol to share
    the location of a data-buffer that is really an N-dimensional
    array.  However, there is no standard way to exchange the
    additional N-dimensional array information so that the data-buffer
    is interpreted correctly.  The NumPy project introduced an array
    interface (http://numpy.scipy.org/array_interface.shtml) through a
    set of attributes on the object itself.  While this approach
    works, it requires attribute lookups which can be expensive when
    sharing many small arrays.  

    One of the key reasons that users often request to place something
    like NumPy into the standard library is so that it can be used as
    standard for other packages that deal with arrays.  This PEP
    provides a mechanism for extending the buffer protocol (which
    already allows data sharing) to add the additional information
    needed to understand the data.  This should be of benefit to all
    third-party modules that want to share memory through the buffer
    protocol such as GUI toolkits, PIL, PyGame, CVXOPT, PyVoxel,
    PyMedia, audio libraries, video libraries etc.

Proposal

    Add bf_getarrview and bf_relarrview function pointers to the
    buffer protocol to allow objects to share a view on a memory
    pointer including information about accessing it as an
    N-dimensional array. Add the TP_HAS_ARRAY_BUFFER flag to types
    that define this extended buffer protocol.

    Also a few additionsl C-API calls should perhaps be added to Python
    to facilitate creating new PyArrViewObjects. 

Specification:

    static PyObject* bf_getarrayview (PyObject *obj)

    This function must return a new reference to a PyArrViewObject
    which contains the details of the array information exposed by the
    object.  If failure occurs, then NULL is returned and an exception 
    set.  

    static int bf_relarrayview(PyObject *obj)

    If not NULL then this will be called when the object returned by 
    bf_getarrview is destroyed so that the underlying object can be
    aware when acquired "views" are released.  

    The object that defines bf_getarrview should not re-allocate memory
    (re-size itself) while views are extant.  A 0 is returned on success 
    and a -1 and an error condition set on failure.

    The PyArrayViewObject has the structure 

    typedef struct {
         PyObject_HEAD
         void *data;             /* pointer to the beginning of data */
         int nd;                 /* the number of dimensions */
         Py_ssize_t *shape;      /* c-array of size nd giving shape */
         Py_ssize_t *strides;    /* SEE BELOW */
         PyObject *base;         /* the object this is a "view" of */
         PyObject *format;       /* SEE BELOW */
         int flags;              /* SEE BELOW */
    } PyArrayViewObject;

    strides -- a c-array of size nd providing the striding information
       which is the number of bytes to skip to get to the next element 
       in that dimension. 

    format -- a Python data-format object (PyDataFormatObject) which
              contains information about how each item in the array 
              should be interpreted.

    flags   -- an integer of flags.  PYARR_WRITEABLE is the only flag 
                  that must be set appropriately by types. 
                  Other flags: PYARR_ALIGNED, PYARR_C_CONTIGUOUS,
                  PYARR_F_CONTIGUOUS, and PYARR_NOTSWAPPED can all be determined
                  from the rest of the PyArrayViewObject using the UpdateFlags C-API.

    The PyDataFormatObject has the structure

    typedef struct {
         PyObject_HEAD
         PySimpleformat primitive;  /* basic primitive type */
         int flags;                 /* byte-order, isaligned */
         int itemsize;              /* SEE BELOW */
         int alignment;             /* SEE BELOW */
         PyObject *extended;        /* SEE BELOW */
    } PyDataFormatObject;

    enum Pysimpleformat {PY_BIT='1', PY_BOOL='?', PY_BYTE='b', PY_SHORT='h', PY_INT='i',
     PY_LONG='l', PY_LONGLONG='q', PY_UBYTE='B', PY_USHORT='H', PY_UINT='I', 
     PY_ULONG='L', PY_ULONGLONG='Q', PY_FLOAT='f', PY_DOUBLE='d', PY_LONGDOUBLE='g',
     PY_CFLOAT='F', PY_CDOUBLE='D', PY_CLONGDOUBLE='G', PY_OBJECT='O', 
     PY_CHAR='c', PY_UCS2='u', PY_UCS4='w', PY_FUNCPTR='X', PY_VOIDPTR='V'};

     Each of these simple formats has a special character code which can be used to
     identify this primitive in a nested python list.

    flags -- flags for the data-format object.  Specified masks are
                PY_NATIVEORDER
                PY_BIGENDIAN
                PY_LITTLEENDIAN
                PY_IGNORE

    itemsize -- the total size represented by this data-format in bytes unless the 
                primitive is PY_BIT in which case it is the size in bits.  
                For data-formats that are simple 1-d arrays of the underlying primitive, 
                this total size can represent more than one primitive (with extended
                still NULL).

    alignment -- For the primitive types this is offsetof(struct {char c; type v;},v)

    extended -- NULL if this is a primitive data-type or no additional information is 
                available.

                If primitive is PY_FUNCPTR, then this can be a tuple with >=1 element:
                (args, {dim0, dim1, dim2, ...}). 

                  args -- A list (of at least length 2) of data-format objects
                          specifying the input argument formats with the last
                          argument specifying the output argument data-format
                          (use None for void inputs and/or outputs).

                For other primitives, this can be a tuple with >=2 elements: 
                (names, fields, {dim0, dim1, dim2, ...})
                Use None for both names and fields if they should be ignored.

                  names -- An ordered list of string or unicode objects giving the names
                           of the fields for a structure data-format.
                  fields -- a Python dictionary with ordered-keys given by the list 
                            in names. Each entry in the dictionary is  
                            a 3-tuple containing (data-format-object, offset, 
                            meta-information) where meta-information is Py_None if there 
                            is no meta-information. Offset is given in bytes from the 
                            start of the record or in bits if PY_BIT is the primitive.

                Any additional entries in the extended tuple (dim0,
                dim1, etc.) are interpreted as integers which specify
                that this data-format is an array of the given shape
                of the fundamental data-format specified by the
                remainder of the DataFormat Object.  The dimensions
                are specified so that the last-index is always assumed
                to vary the fastest (C-order).

     The constructor of a PyArrViewObject allocates the memory for shape and strides
         and the destructor frees that memory.

     The constructor of a PyDataFormatObject allocates the objects it needs for fields, 
         names, and shape.

C-API 

    void PyArrayView_UpdateFlags(PyObject *view, int flags)
         /* update the flags on the array view object provided */

    PyDataFormatObject *Py_NewSimpleFormat(Pysimpleformat primitive)
         /* return a new primitive data-format object */

    PyDataFormatObject *Py_DataFormatFromCType(PyObject *ctype)
         /* return a new data-format object from a ctype */

    int Py_GetPrimitiveSize(Pysimpleformat primitive)
         /* return the size (in bytes) of the provided primitive */

    PyDataFormatObject *Py_AlignDataFormat(PyObject *format)
         /* take a data-format object and construct an aligned data-format
            object where all fields are aligned on appropriate boundaries 
            for the compiler */

Discussion

    The information provided in the array view object is patterned
    after the way a multi-dimensional array is defined in NumPy -- including
    the data-format object which allows a variety of descriptions of memory
    depending on the need. 

Reference Implementation

    Supplied when the PEP is accepted. 

Copyright

    This document is placed in the public domain.

Latest Array-Interface PEP

Francesc Altet

Michael McLay

Francesc Altet

Michael McLay

tags

participants (15)