[Python-Dev] Extended Buffer Interface/Protocol
Travis Oliphant
oliphant.travis at ieee.org
Thu Mar 22 03:48:25 CET 2007
Greg Ewing wrote:
> Travis Oliphant wrote:
>
>
>>The question is should we eliminate the possibility of sharing memory
>>for objects that store data basically as "arrays" of arrays (i.e. true
>>C-style arrays).
>
>
> Can you clarify what you mean by this? Are you talking
> about an array of pointers to other arrays? (This is
> not what I would call an "array of arrays", even in C.)
I'm talking about arrays of pointers to other arrays:
i.e. if somebody defined in C
float B[10][20]
then B would B an array of pointers to arrays of floats.
>
> Supporting this kind of thing could be a slippery slope,
> since there can be arbitrary levels of complexity to
> such a structure. E.g do you support a 1d array of
> pointers to 3d arrays of pointers to 2d arrays? Etc.
>
Yes, I saw that. But, it could actually be supported, in general.
The shape information is available. If a 3-d array is meant then ndims
is 3 and you would re-cast the returned pointer appropriately.
In other words, suppose that instead of strides you can request a
variable through the buffer interface with type void **segments.
Then, by passing the address to a void * variable to the routine you
would receive the array. Then, you could handle 1-d, 2-d, and 3-d cases
using something like this:
This is pseudocode:
void *segments;
int ndims;
Py_ssize_t *shape;
char *format;
(&ndims, &shape, &format, and &segments) are passed to the buffer
interface.
if strcmp(format, "f") != 0
raise an error.
if (ndims == 1)
var = (float *)segments
for (i=0; i<shape[0]; i++)
# process var[i]
else if (ndims == 2)
var = (float **)segments
for (i=0; i<shape[0]; i++)
for (j=0; j<shape[1]; j++)
# process var[i][j]
else if (ndims == 3)
var = (float ***)segments
for (i=0; i<shape[0]; i++)
for (j=0; j<shape[1]; j++)
for (k=0; j<shape[2]; k++)
# process var[i][j][k]
else
raise an Error.
> The more different kinds of format you support, the less
> likely it becomes that the thing consuming the data
> will be willing to go to the trouble required to
> understand it.
That is certainly true. I'm really only going through the trouble,
since the multiple segment already exists and the PIL has this memory
model (although I have not heard PIL developers clamoring for support,
--- I'm just being sensitive to that extension type).
>
>
>>One possible C-API call that Python could grow with the current buffer
>>interface is to allow contiguous-memory mirroring of discontiguous
>>memory,
>
>
> I don't think the buffer protocol itself should incorporate
> anything that requires implicitly copying the data, since
> the whole purpose of it is to provide direct access to the
> data without need for copying.
No, this would not be the buffer protocol, but merely a C-API that would
use the buffer protocol - i.e. it is just a utility function as you mention.
>
> It would be okay to supply some utility functions for
> re-packing data, though.
>
>
>>or an iterator object that iterates through every element of any
>>object that exposes the buffer protocol.
>
>
> Again, for efficiency reasons I wouldn't like to involve
> Python objects and iteration mechanisms in this.
I was thinking more of a C-iterator, like NumPy provides. This can be
very efficient (as long as the loop is not in Python).
It sure provides a nice abstraction that lets you deal with
discontiguous arrays as if they were contiguous, though.
The
> buffer interface is meant to give you raw access to the
> data at raw C speeds. Anything else is outside its scope,
Sure. These things are just ideas about *future* utility functions that
might make use of the buffer interface and motivate its design.
Thanks for your comments.
-Travis
More information about the Python-Dev
mailing list