Re: [Python-Dev] PEP 3118: Extended buffer protocol (new version)
Carl Banks wrote:
Travis Oliphant wrote:
Carl Banks wrote:
Ok, I've thought quite a bit about this, and I have an idea that I think will be ok with you, and I'll be able to drop my main objection. It's not a big change, either. The key is to explicitly say whether the flag allows or requires. But I made a few other changes as well.
I'm good with using an identifier to differentiate between an "allowed" flag and a "require" flag. I'm not a big fan of VERY_LONG_IDENTIFIER_NAMES though. Just enough to understand what it means but not so much that it takes forever to type and uses up horizontal real-estate.
That's fine with me. I'm not very particular about spellings, as long as they're not misleading.
Now, here is a key point: for these functions to work (indeed, for PyObject_GetBuffer to work at all), you need enough information in bufinfo to figure it out. The bufferinfo struct should be self-contained; you should not need to know what flags were passed to PyObject_GetBuffer in order to know exactly what data you're looking at.
Therefore, format must always be supplied by getbuffer. You cannot tell if an array is contiguous without the format string. (But see below.)
No, I don't think this is quite true. You don't need to know what "kind" of data you are looking at if you don't get strides. If you use the SIMPLE interface, then both consumer and exporter know the object is looking at "bytes" which always has an itemsize of 1.
But doesn't this violate the above maxim? Suppose these are the contents of bufinfo:
ndim = 1 len = 20 shape = (10,) strides = (2,) format = NULL
In my thinking, format/itemsize is necessary if you have strides (as you do here) but not needed if you don't have strides information (i.e. you are assuming a C_CONTIGUOUS memory-chunk). The intent of the simple interface is to basically allow consumers to mimic the old buffer protocol, very easily.
How does it know whether it's looking at contiguous array of 10 two-byte objects, or a discontiguous array of 10 one-byte objects, without having at least an item size? Since item size is now in the mix, it's moot, of course.
My only real concern is to have some way to tell the exporter that it doesn't need to "figure out" the format if the consumer doesn't care about it. Given the open-ended nature of the format string, it is possible that a costly format-string construction step could be under-taken even when the consumer doesn't care about it.
I can see you are considering the buffer structure as a self-introspecting structure where I was considering it only in terms of how the consumer would be using its members (which implied it knew what it was asking for and wouldn't touch anything else).
How about we assume FORMAT will always be filled in but we add a Py_BUF_REQUIRE_PRIMITIVE flag that will only return "primitive" format strings (i.e. basic c-types)? An exporter receiving this flag will have to return complicated data-types as 'bytes'. I would add this to the Py_BUF_SIMPLE default.
The idea that Py_BUF_SIMPLE implies bytes is news to me. What if you want a contiguous, one-dimensional array of an arbitrary type? I was thinking this would be acceptable with Py_BUF_SIMPLE.
Unsigned bytes are just the lowest common denominator. They represent the old way of sharing memory. Doesn't an "arbitrary type" mean bytes? Or did you mean what if you wanted a contiguous, one-dimensional array of a *specific* type?
It seems you want to require Py_BUF_FORMAT for that, which would suggest to me that But it now it seems even more unnecessary than it did before. Wouldn't any consumer that just wants to look at a chunk of bytes always use Py_BUF_FORMAT, especially if there's danger of a presumptuous exporter raising an exception?
I'll put in the REQUIRE_PRIMITIVE_FORMAT idea in the next update to the PEP. I can just check in my changes to SVN, so it should show up by Friday.