[Python-Dev] PEP 3118: Extended buffer protocol (new version)
Travis Oliphant
oliphant.travis at ieee.org
Fri Apr 13 09:03:04 CEST 2007
Carl Banks wrote:
>
> The thing that bothers me about this whole flags setup is that
> different flags can do opposite things.
>
> Some of the flags RESTRICT the kind of buffers that can be
> exported (Py_BUF_WRITABLE); other flags EXPAND the kind of buffers that
> can be exported (Py_BUF_INDIRECT). That is highly confusing and I'm -1
> on any proposal that includes both behaviors. (Mutually exclusive sets
> of flags are a minor exception: they can be thought of as either
> RESTICTING or EXPANDING, so they could be mixed with either.)
The mutually exclusive set is the one example of the restriction that
you gave.
I think the flags setup I've described is much closer to your Venn
diagram concept than you give it credit for. I've re-worded some of
the discussion (see
http://projects.scipy.org/scipy/numpy/browser/trunk/numpy/numpy/doc/pep_buffer.txt
) so that it is more clear that each flag is a description what kind of
buffer the consumer is prepared to deal with.
For example, if the consumer cares about what's 'in' the array, it uses
Py_BUF_FORMAT. Exporters are free to do what they want with this
information. I agree that NumPy would not force you to use it's buffer
only as a region of some specific type, but some other object may want
to be much more restrictive and only export to consumers who will
recognize the data stored for what it is. I think it's up to the
exporters to decide whether or not to raise an error when a certain kind
of buffer is requested.
Basically, every flag corresponds to a different property of the buffer
that the consumer is requesting:
Py_BUF_SIMPLE --- you are requesting the simplest possible (0x00)
Py_BUF_WRITEABLE -- get a writeable buffer (0x01)
Py_BUF_READONLY -- get a read-only buffer (0x02)
Py_BUF_FORMAT -- get a "formatted" buffer. (0x04)
Py_BUF_SHAPE -- get a buffer with shape information (0x08)
Py_BUF_STRIDES -- get a buffer with stride information (and shape) (0x18)
Py_BUF_OFFSET -- get a buffer with suboffsets (and strides and shape) (0x38)
This is a logical sequence. There is progression. Each flag is a bit
that indicates something about how the consumer can use the buffer. In
other words, the consumer states what kind of buffer is being
requested. The exporter obliges (and can save possibly significant time
if the consumer is not requesting the information it must otherwise
produce).
> I originally suggested a small set of flags that expand the set of
> allowed buffers. Here's a little Venn diagram of buffers to
> illustrate what I was thinking:
>
> http://www.aerojockey.com/temp/venn.png
>
> With no flags, the only buffers allowed to be returned are in the "All"
> circle but no others. Add Py_BUF_WRITABLE and now you can export
> writable buffers as well. Add Py_BUF_STRIDED and the strided circle is
> opened to you, and so on.
>
> My recommendation is, any flag should turn on some circle in the Venn
> diagram (it could be a circle I didn't draw--shaped arrays, for
> example--but it should be *some* circle).
I don't think your Venn diagram is broad enough and it un-necessarily
limits the use of flags to communicate between consumer and exporter.
We don't have to ram these flags down that point-of-view for them to be
productive. If you have a specific alternative proposal, or specific
criticisms, then I'm very willing to hear them.
I've thought through the flags again, and I'm not sure how I would
change them. They make sense to me. Especially in light of past
usages of the buffer protocol (where most people request read-or-write
buffers i.e. Py_BUF_SIMPLE. I'm also not sure our mental diagrams are
both oriented the same. For me, the most restrictive requests are
PY_BUF_WRITEABLE | Py_BUF_FORMAT and Py_BUF_READONLY | Py_BUF_FORMAT
The most un-restrictive request (the largest circle in my mental Venn
diagram) is
Py_BUF_OFFSETS followed by Py_BUF_STRIDES followed by Py_BUF_SHAPE
adding Py_BUF_FORMATS, Py_BUF_WRITEABLE, or Py_BUF_READONLY serves to
restrict any of the other circles
Is this dual use of flags what bothers you? (i.e. use of some flags for
restricting circles in your Venn diagram that are turned on by other
flags? --- you say Py_BUF_OFFSETS | Py_BUF_WRITEABLE to get the
intersection of the Py_BUF_OFFSETS largest circle with the WRITEABLE
subset?)
Such concerns are not convincing to me. Just don't think of the flags
in that way. Think of them as turning "on" members of the bufferinfo
structure.
>
>
>>>> Py_BUF_FORMAT
>>>> The consumer will be using the format string information so make
>>>> sure that member is filled correctly.
>>>
>>> Is the idea to throw an exception if there's some other data format
>>> besides "b", and this flag isn't set? It seems superfluous otherwise.
>>
>> The idea is that a consumer may not care about the format and the
>> exporter may want to know that to simplify the interface. In other
>> words the flag is a way for the consumer to communicate that it wants
>> format information (or not).
>
> I'm -1 on using the flags for this. It's completely out of character
> compared to the rest of the flags. All other flags are there for the
> benefit of the consumer; this flag is useless to the consumer.
>
> More concretely, all the rest of the flags are there to tell the
> exporter what kind of buffer they're prepared to accept. This flag,
> alone, does not do that.
I agree. This flag is used by the consumer to state that it wants, will
be making note of, and is prepared to deal with a "formatted" buffer.
I think it's short-sighted to have flags to control providing the other
members of the PyBuffer structure and not this one.
Actually, the "rare" optimization to the exporter can still be
significant if most consumers don't care about it's format (which
perhaps it has to construct at request time).
>> If the exporter wants to raise an exception if the format is not
>> requested is up to the exporter.
>
> That seems like a bad idea. Suppose I have a contiguous numpy array of
> floats and I want to view it as a sequence of bytes. If the exporter's
> allowed to raise an exception for this, any consumer that wanted a
> data-neutral view of the data would still have to pass Py_BUF_FORMAT to
> guard against this. Wouldn't that be ironic?
I agree that NumPy would not do this as it would allow un-formatted
views. In fact, most exporters would probably choose not to raise an
error. But, an exporter that really only wants it's data viewed as
(e.g. complex numbers) would raise an error to force a consumer to be
explicit (by providing the Py_BUF_FORMAT flag) about by-passing that
desire.
>
> Ok, but is the indexing row-major or column-major? That has to be
> decided.
I think it's called row-major, but I don't like that term because what
do you mean for an N-D array? I use 'last-index varies the fastest' if I
want to be explicity and C-contiguous if we know what we are talking
about. Yes this is assumed in such cases.
-Travis
More information about the Python-Dev
mailing list