[Python-Dev] Extended Buffer Interface/Protocol

Thu Mar 22 03:48:25 CET 2007

Greg Ewing wrote:
> Travis Oliphant wrote:
> 
> 
>>The question is should we eliminate the possibility of sharing memory 
>>for objects that store data basically as "arrays" of arrays (i.e. true 
>>C-style arrays).
> 
> 
> Can you clarify what you mean by this? Are you talking
> about an array of pointers to other arrays? (This is
> not what I would call an "array of arrays", even in C.)

I'm talking about arrays of pointers to other arrays:

i.e. if somebody defined in C

float B[10][20]

then B would B an array of pointers to arrays of floats.

> 
> Supporting this kind of thing could be a slippery slope,
> since there can be arbitrary levels of complexity to
> such a structure. E.g do you support a 1d array of
> pointers to 3d arrays of pointers to 2d arrays? Etc.
> 

Yes, I saw that.  But, it could actually be supported, in general.
The shape information is available.  If a 3-d array is meant then ndims
is 3 and you would re-cast the returned pointer appropriately.

In other words, suppose that instead of strides you can request a 
variable through the buffer interface with type void **segments.

Then, by passing the address to a void * variable to the routine you 
would receive the array.  Then, you could handle 1-d, 2-d, and 3-d cases 
using something like this:

This is pseudocode:

void *segments;
int ndims;
Py_ssize_t *shape;
char *format;

(&ndims, &shape, &format, and &segments) are passed to the buffer 
interface.

if strcmp(format, "f") != 0
     raise an error.

if (ndims == 1)

    var = (float *)segments
    for (i=0; i<shape[0]; i++)
         # process var[i]

else if (ndims == 2)

    var = (float **)segments
    for (i=0; i<shape[0]; i++)
        for (j=0; j<shape[1]; j++)
            # process var[i][j]

else if (ndims == 3)

     var = (float ***)segments
     for (i=0; i<shape[0]; i++)
         for (j=0; j<shape[1]; j++)
             for (k=0; j<shape[2]; k++)
              # process var[i][j][k]

else

      raise an Error.

> The more different kinds of format you support, the less
> likely it becomes that the thing consuming the data
> will be willing to go to the trouble required to
> understand it.

That is certainly true.   I'm really only going through the trouble, 
since the multiple segment already exists and the PIL has this memory 
model (although I have not heard PIL developers clamoring for support, 
--- I'm just being sensitive to that extension type).

> 
> 
>>One possible C-API call that Python could grow with the current buffer 
>>interface is to allow contiguous-memory mirroring of discontiguous 
>>memory,
> 
> 
> I don't think the buffer protocol itself should incorporate
> anything that requires implicitly copying the data, since
> the whole purpose of it is to provide direct access to the
> data without need for copying.

No, this would not be the buffer protocol, but merely a C-API that would 
use the buffer protocol - i.e. it is just a utility function as you mention.

> 
> It would be okay to supply some utility functions for
> re-packing data, though.
> 
> 
>>or an iterator object that iterates through every element of any 
>>object that exposes the buffer protocol.
> 
> 
> Again, for efficiency reasons I wouldn't like to involve
> Python objects and iteration mechanisms in this. 

I was thinking more of a C-iterator, like NumPy provides.  This can be 
very efficient (as long as the loop is not in Python).

It sure provides a nice abstraction that lets you deal with 
discontiguous arrays as if they were contiguous, though.

The
> buffer interface is meant to give you raw access to the
> data at raw C speeds. Anything else is outside its scope,

Sure.  These things are just ideas about *future* utility functions that 
might make use of the buffer interface and motivate its design.

Thanks for your comments.

-Travis