[Python-Dev] PEP 3118: Extended buffer protocol (new version)
Carl Banks
pythondev at aerojockey.com
Tue Apr 17 08:36:39 CEST 2007
Travis Oliphant wrote:
> Carl Banks wrote:
>> My recommendation is, any flag should turn on some circle in the Venn
>> diagram (it could be a circle I didn't draw--shaped arrays, for
>> example--but it should be *some* circle).
> I don't think your Venn diagram is broad enough and it un-necessarily
> limits the use of flags to communicate between consumer and exporter.
> We don't have to ram these flags down that point-of-view for them to be
> productive. If you have a specific alternative proposal, or specific
> criticisms, then I'm very willing to hear them.
Ok, I've thought quite a bit about this, and I have an idea that I think
will be ok with you, and I'll be able to drop my main objection. It's
not a big change, either. The key is to explicitly say whether the flag
allows or requires. But I made a few other changes as well.
First of all, let me define how I'm using the word "contiguous": it's a
single buffer with no gaps. So, if you were to do this:
"memset(bufinfo->buf,0,bufinfo->len)", you would not touch any data that
isn't being exported.
Without further ado, here is my proposal:
------
With no flags, the PyObject_GetBuffer will raise an exception if the
buffer is not direct, contiguous, and one-dimensional. Here are the
flags and how they affect that:
Py_BUF_REQUIRE_WRITABLE - Raise exception if the buffer isn't writable.
Py_BUF_REQUIRE_READONLY - Raise excpetion if the buffer is writable.
Py_BUF_ALLOW_NONCONTIGUOUS - Allow noncontiguous buffers. (This turns
on "shape" and "strides".)
Py_BUF_ALLOW_MULTIDIMENSIONAL - Allow multidimensional buffers. (Also
turns on "shape" and "strides".)
(Neither of the above two flags implies the other.)
Py_BUF_ALLOW_INDIRECT - Allow indirect buffers. Implies
Py_BUF_ALLOW_NONCONTIGUOUS and Py_BUF_ALLOW_MULTIDIMENSIONAL. (Turns on
"shape", "strides", and "suboffsets".)
Py_BUF_REQUIRE_CONTIGUOUS_C_ARRAY or Py_BUF_REQUIRE_ROW_MAJOR - Raise an
exception if the array isn't a contiguous array with in C (row-major)
format.
Py_BUF_REQUIRE_CONTIGUOUS_FORTRAN_ARRAY or Py_BUF_REQUIRE_COLUMN_MAJOR -
Raise an exception if the array isn't a contiguous array with in Fortran
(column-major) format.
Py_BUF_ALLOW_NONCONTIGUOUS, Py_BUF_REQUIRE_CONTIGUOUS_C_ARRAY, and
Py_BUF_REQUIRE_CONTIGUOUS_FORTRAN_ARRAY all conflict with each other,
and an exception should be raised if more than one are set.
(I would go with ROW_MAJOR and COLUMN_MAJOR: even though the terms only
make sense for 2D arrays, I believe the terms are commonly generalized
to other dimensions.)
Possible pseudo-flags:
Py_BUF_SIMPLE = 0;
Py_BUF_ALLOW_STRIDED = Py_BUF_ALLOW_NONCONTIGUOUS
| Py_BUF_ALLOW_MULTIDIMENSIONAL;
------
Now, for each flag, there should be an associated function to test the
condition, given a bufferinfo struct. (Though I suppose they don't
necessarily have to map one-to-one, I'll do that here.)
int PyBufferInfo_IsReadonly(struct bufferinfo*);
int PyBufferInfo_IsWritable(struct bufferinfo*);
int PyBufferInfo_IsContiguous(struct bufferinfo*);
int PyBufferInfo_IsMultidimensional(struct bufferinfo*);
int PyBufferInfo_IsIndirect(struct bufferinfo*);
int PyBufferInfo_IsRowMajor(struct bufferinfo*);
int PyBufferInfo_IsColumnMajor(struct bufferinfo*);
The function PyObject_GetBuffer then has a pretty obvious
implementation. Here is an except:
if ((flags & Py_BUF_REQUIRE_READONLY) &&
!PyBufferInfo_IsReadonly(&bufinfo)) {
PyExc_SetString(PyErr_BufferError,"buffer not read-only");
return 0;
}
Pretty straightforward, no?
Now, here is a key point: for these functions to work (indeed, for
PyObject_GetBuffer to work at all), you need enough information in
bufinfo to figure it out. The bufferinfo struct should be
self-contained; you should not need to know what flags were passed to
PyObject_GetBuffer in order to know exactly what data you're looking at.
Therefore, format must always be supplied by getbuffer. You cannot tell
if an array is contiguous without the format string. (But see below.)
And even if the consumer isn't asking for a contiguous buffer, it has to
know the item size so it knows what data not to step on.
(This is true even in your own proposal, BTW. If a consumer asks for a
non-strided array in your proposal, PyObject_GetBuffer would have to
know the item size to determine if the array is contiguous.)
------
FAQ:
Q. Why ALLOW_NONCONTIGUOUS and ALLOW_MULTIDIMENSIONAL instead of
ALLOW_STRIDED and ALLOW_SHAPED?
A. It's more useful to the consumer that way. With ALLOW_STRIDED and
ALLOW_SHAPED, there's no way for a consumer to request a general
one-dimensional array (it can only request a non-strided one-dimensional
array), and requesting a SHAPED array but not a STRIDED one can only
return a C-like (row-major) array, although a consumer might reasonably
want a Fortran-like (column-major) array. This approach maps more
directly to the consumer's needs, is more flexible, and still maintains
the same functionality of ALLOW_SHAPED and ALLOW_STRIDED.
Q. Why call it ALLOW_INDIRECT instead of ALLOW_OFFSETS?
A. It's just a name, and not too important to me, but I wanted to
emphasize the consumer's usage, rather than the benefit to the exporter.
The consumers, after all, are the ones setting the flags.
Q. Why ALLOW_NONCONTIGUOUS instead of REQUIRE_CONTIGUOUS?
Two reasons: 1. Contiguous arrays are "simpler", so it's better to make
the people who want more complex arrays to work harder, and 2.
ALLOW_NONCONTIGUOUS is closely tied to ALLOW_MULTIDIMENSIONAL. If the
negative is a problem, perhaps a name like ALLOW_DISCONTINUOUS or
ALLOW_GAPS would be better?
Q. What about Py_BUF_FORMAT?
A. Ok, fine, if it's that imporant to you. I think it's totally
superfluous, but it's not evil. But consider these things:
1. Require that it does not throw an exception. It's not the exporter's
business to tell the consumer to how to use its data.
2. Even if you don't supply the format string, you need to supply an
itemsize in struct bufferinfo, otherwise there is no way for a consumer
to determine if the array is contiguous, and or to know (in general)
what data is being exported. The itemsize must ALWAYS be available.
3. Invert Py_BUF_FORMAT. Use Py_BUF_DONT_NEED_FORMAT instead. Make the
consumer that cares about performance ask for the optimization. (You
admit yourself that Py_BUF_FORMAT is part of the least common
denominator, so invert it.)
I would be -0 on it if all three of these conditions are met.
------
Conclusion:
My main objection, that the flags are confusing because some allow and
others restrict, would be remedied just by using ALLOW and REQUIRE in
the constant. Even if you still want to go with ALLOW_STRIDED and
ALLOW_SHAPE, I'd still be -0 as long as the ALLOW is there.
I still think Py_BUF_FORMAT is superfluous, but I can live with it if
some other things happen.
Carl Banks
More information about the Python-Dev
mailing list