[Python-3000] PEP Draft: Enhancing the buffer protcol

Wed Feb 28 04:38:39 CET 2007

On 2/27/07, Josiah Carlson <jcarlson at uci.edu> wrote:
> About the only issue I can see with implementing the mechanism as you
> describe is that everything that wants to offer the buffer interface
> would need to store its data in a PyArray structure.  Bytes, unicode,
> array.array, mmap, etc.  Most of the difference will essentially be a
> call to PyArray_New() rather than PyMalloc(), and an indirection via
> macro of PyArray_ASSTRINGANDSIZE() to get the pointer and length of the
> buffer. I would suspect that such overhead would be minimal, but without
> implementing and testing it on something that is used often (maybe
> Python 2.x strings as the simplest example?), it would be hard to say.

Each type can implement it's own PyArray subtype, so there'd be no
need for a macro/function to do the indirection.  For example, if we
wanted to build an C integer-based array type for some reason, we
could create it's PyArray subtype as follows:

typedef struct {
    PyObject_HEAD
    int ival[1];
} PyIntArray;

The data can then be accessed cleanly like this:

PyIntArray *my_array = allocate_some_memory();
my_array->ival[some_index] = v;

Possibly on some architectures accessing the data will be very
slightly slower because ival isn't at the top of the structure.  I
wrote a short test program just now and didn't see a difference on my
architecture (Intel Duo).

> The benefit to implementing the interface as described by Travis is that
> if an object is read-only (like unicode), the acquire/release is (as in
> the PyArray version) an incref/decref, and no other structural changes
> are necessary.

If I read the source right, the current Unicode implementation
converts the unicode string to a regular string using the default
encoding when a buffer is requested.  Presumably this will need to be
re-thought for Python 3000 since non-unicode strings are going away.

However, for certain read-only types (like 2.5-style strings) their
implementation is already a PyObject with an array-tacked on to the
end.  These could be subtypes of the PyArray type with very little
trouble, and it would only be necessary to maintain one reference
counter for them instead of two.

-- 
Daniel Stutzbach, Ph.D.             President, Stutzbach Enterprises LLC