[Python-Dev] Extended Buffer Interface/Protocol

Fri Mar 23 06:53:55 CET 2007

(cc'ing back to Python-dev; the original reply was intended for it by I 
had an email malfunction.)

Travis Oliphant wrote:
 >Carl Banks wrote:
>> 3. Allow getbuffer to return an array of "derefence offsets", one for 
>> each dimension.  For a given dimension i, if derefoff[i] is 
>> nonnegative, it's assumed that the current position (base pointer + 
>> indexing so far) is a pointer to a subarray, and derefoff[i] is the 
>> offest in that array where the current position goes for the next 
>> dimension.  If derefoff[i] is negative, there is no dereferencing.  
>> Here is an example of how it'd work:
> 
> 
> This sounds interesting, but I'm not sure I totally see it.  I probably 
> need a picture to figure out what you are proposing. 

I'll get on it sometime.  For now I hope an example will do.

> The derefoff 
> sounds like some-kind of offset.   Is that enough?  Why not just make 
> derefoff[i] == 0 instead of negative?

I may have misunderstood something.  I had thought the values exported 
by getbuffer could change as the view narrowed, but I'm not sure if it's 
the case now.  I'll assume it isn't for now, because it simplifies 
things and demonstrates the concept better.

Let's start from the beginning.  First, change the prototype to this:

     typedef PyObject *(*getbufferproc)(PyObject *obj, void **buf,
                                        Py_ssize_t *len, int *writeable,
                                        char **format, int *ndims,
                                        Py_ssize_t **shape,
                                        Py_ssize_t **strides,
                                        int **isptr)

"isptr" is a flag indicating whether, for a certain dimension, the 
positision we've strided to so far is a pointer that should be followed 
before proceeding with the rest of the strides.

Now here's what a general "get_item_pointer" function would look like, 
given a set of indices:

void* get_item_pointer(int ndim, void* buf, Py_ssize_t* strides,
                        Py_ssize_t* derefoff, Py_ssize_t *indices) {
     char* pointer = (char*)buf;
     int i;
     for (i = 0; i < ndim; i++) {
         pointer += strides[i]*indices[i];
         if (isptr[i]) {
             pointer = *(char**)pointer;
         }
     }
     return (void*)pointer;
}

> I don't fully understand the PIL example you gave.

Yeah.  How about more details.  Here is a hypothetical image data object 
structure:

struct rgba {
     unsigned char r, g, b, a;
};

struct ImageObject {
     PyObject_HEAD;
     ...
     struct rgba** lines;
     Py_ssize_t height;
     Py_ssize_t width;
     Py_ssize_t shape_array[2];
     Py_ssize_t stride_array[2];
     Py_ssize_t view_count;
};

"lines" points to malloced 1-D array of (struct rgba*).  Each pointer in 
THAT block points to a seperately malloced array of (struct rgba).  Got 
that?

In order to access, say, the red value of the pixel at x=30, y=50, you'd 
use "lines[50][30].r".

So what does ImageObject's getbuffer do?  Leaving error checking out:

PyObject* getbuffer(PyObject *self, void **buf, Py_ssize_t *len,
                     int *writeable, char **format, int *ndims,
                     Py_ssize_t **shape, Py_ssize_t **strides,
                     int **isptr) {

     static int _isptr[2] = { 1, 0 };

     *buf = self->lines;
     *len = self->height*self->width;
     *writable = 1;
     *ndims = 2;
     self->shape_array[0] = height;
     self->shape_array[1] = width;
     *shape = &self->shape_array;
     self->stride_array[0] = sizeof(struct rgba*);  /* yep */
     self->stride_array[1] = sizeof(struct rgba);
     *strides = &self->stride_array;
     *isptr = _isptr;

     self->view_count ++;
     /* create and return view object here, but for what? */
}

There are three essential differences from a regular, contiguous array.

1. buf is set to point at the array of pointers, not directly to the data.

2. The isptr thing.  isptr[0] is true to indicate that the first 
dimension is an array of pointers, not the actual data.

3. stride[0] is sizeof(struct rgba*), not self->width*sizeof(struct 
rgba) like it would be for a contiguous array.  This is because your 
first stride is through an array of pointers, not the data itself.

So let's examine what "get_item_pointer" above will do given these 
values.  Once again, we're looking for the pixel at x=30, y=50.

First, we set pointer to buf, that is, self->lines.

Then we take the first stride: we add index[0]+strides[0], that is, 
50*4=200, to poitner.  pointer now equals &self->lines[50].

Now, we check isptr[0].  We see that it is true.  Thus, the position 
we've strided to is, in fact, a pointer to a subarray where the actual 
data is.  So we follow it: pointer = *pointer.  pointer now equals 
self->lines[50] which equals &self->lines[50][0].

Next dimension.  We take the second stride: we add index[1]+strides[1], 
that is, 30*4=120, to pointer.  pointer now equals &self->lines[50][30].

Now, we check isptr[1].  It's false.  No dereferencing this step.

We're done.  Return pointer.

>> By the way, has anyone signed up to modify the standard library 
>> modules?  I could do those when the protocol is finalized.  And if 
>> you're implementing the new buffer protocol in 2.6 (while deprecating 
>> but not removing the old protocol, I presume), will the modules also 
>> be updated for 2.6?
>>
> Nobody has signed up for anything.  I'm willing for anyone to help.   
> Many of the standard library modules will need to be modified.   And 
> yes, I do want to implement the new protocol for 2.6 (adding it to the 
> current one).  Updating the modules for 2.6 would not be high priority 
> (except the struct module), but is a desirable.

Ok, then, consider me available for it.

> Thanks for the ideas.

Ok, I have two questions, now.

First, I'm not sure why getbuffer needs to return a view object.  I 
expect most views of data to be created separately--for instance, a view 
of an image is likely to be created in Python using something like this:

    imgview = ImageView(image,(left,right),(top,bottom))

I'd expect the ImageView object would call getbuffer and use the data 
returned in buf, len, writable, etc., and would have no need for a 
type-specific view object.

Furthermore, I would expect in many cases different views are desirable, 
and some cases where the viewer is unknown to the exporter.

And, if it does have to return a view for some reason, why bother 
returning buf, len, and friends in the function?  Just return those 
values in the view object.

Second question: what happens if a view wants to re-export the buffer? 
Do the views of the buffer ever change?  Example, say you create a 
transposed view of a Numpy array.  Now you want a slice of the 
transposed array.  What does the transposed view's getbuffer export?

Naively, I'd expect the "strides" and "shape" array to have rearranged 
indices, but it looks like you might be trying to get rid of this 
complexity.

The reason I ask is: if things like "buf" and "strides" and "shape" 
could change when a buffer is re-exported, then it can complicate things 
for PIL-like buffers.  (How would you account for offsets in a dimension 
that's in a subarray?)

Carl Banks