[Python-Dev] PEP 3118: Extended buffer protocol (new version)

Travis Oliphant oliphant.travis at ieee.org
Fri Apr 13 09:03:04 CEST 2007


Carl Banks wrote:
>
> The thing that bothers me about this whole flags setup is that 
> different flags can do opposite things.
>
> Some of the flags RESTRICT the kind of buffers that can be
> exported (Py_BUF_WRITABLE); other flags EXPAND the kind of buffers that
> can be exported (Py_BUF_INDIRECT).  That is highly confusing and I'm -1
> on any proposal that includes both behaviors.  (Mutually exclusive sets
> of flags are a minor exception: they can be thought of as either
> RESTICTING or EXPANDING, so they could be mixed with either.)
The mutually exclusive set is the one example of the restriction that 
you gave. 

I think the flags setup I've described is much closer to your Venn 
diagram concept than you give it credit for.   I've re-worded some of 
the discussion (see 
http://projects.scipy.org/scipy/numpy/browser/trunk/numpy/numpy/doc/pep_buffer.txt 
) so that it is more clear that each flag is a description what kind of 
buffer the consumer is prepared to deal with.

For example, if the consumer cares about what's 'in' the array, it uses 
Py_BUF_FORMAT.   Exporters are free to do what they want with this 
information.   I agree that NumPy would not force you to use it's buffer 
only as a region of some specific type, but some other object may want 
to be much more restrictive and only export to consumers who will 
recognize the data stored for what it is.    I think it's up to the 
exporters to decide whether or not to raise an error when a certain kind 
of buffer is requested.

Basically, every flag corresponds to a different property of the buffer 
that the consumer is requesting:

Py_BUF_SIMPLE  --- you are requesting the simplest possible  (0x00)

Py_BUF_WRITEABLE --  get a writeable buffer   (0x01)

Py_BUF_READONLY --  get a read-only buffer    (0x02)

Py_BUF_FORMAT --  get a "formatted" buffer.   (0x04)

Py_BUF_SHAPE -- get a buffer with shape information  (0x08)

Py_BUF_STRIDES --  get a buffer with stride information (and shape)  (0x18)

Py_BUF_OFFSET -- get a buffer with suboffsets (and strides and shape) (0x38)

This is a logical sequence.  There is progression.  Each flag is a bit 
that indicates something about how the consumer can use the buffer.  In 
other words, the consumer states what kind of buffer is being 
requested.  The exporter obliges (and can save possibly significant time 
if the consumer is not requesting the information it must otherwise 
produce).

> I originally suggested a small set of flags that expand the set of 
> allowed buffers.  Here's a little Venn diagram of buffers to 
> illustrate what I was thinking:
>
> http://www.aerojockey.com/temp/venn.png
>
> With no flags, the only buffers allowed to be returned are in the "All"
> circle but no others.  Add Py_BUF_WRITABLE and now you can export
> writable buffers as well.  Add Py_BUF_STRIDED and the strided circle is
> opened to you, and so on.
>
> My recommendation is, any flag should turn on some circle in the Venn
> diagram (it could be a circle I didn't draw--shaped arrays, for
> example--but it should be *some* circle).
I don't think your Venn diagram is broad enough and it un-necessarily 
limits the use of flags to communicate between consumer and exporter.   
We don't have to ram these flags down that point-of-view for them to be 
productive.    If you have a specific alternative proposal, or specific 
criticisms, then I'm very willing to hear them.   

I've thought through the flags again, and I'm not sure how I would 
change them.  They make sense to me.   Especially in light of past 
usages of the buffer protocol (where most people request read-or-write 
buffers i.e. Py_BUF_SIMPLE.   I'm also not sure our mental diagrams are 
both oriented the same.  For me, the most restrictive requests are

PY_BUF_WRITEABLE | Py_BUF_FORMAT and Py_BUF_READONLY | Py_BUF_FORMAT

The most un-restrictive request (the largest circle in my mental Venn 
diagram) is

Py_BUF_OFFSETS followed by Py_BUF_STRIDES followed by Py_BUF_SHAPE

adding Py_BUF_FORMATS, Py_BUF_WRITEABLE, or Py_BUF_READONLY serves to 
restrict any of the other circles

Is this dual use of flags what bothers you?  (i.e. use of some flags for 
restricting circles in your Venn diagram that are turned on by other 
flags? --- you say Py_BUF_OFFSETS | Py_BUF_WRITEABLE to get the 
intersection of the Py_BUF_OFFSETS largest circle with the WRITEABLE 
subset?) 

Such concerns are not convincing to me.  Just don't think of the flags 
in that way.  Think of them as turning "on" members of the bufferinfo 
structure.  

>
>
>>>> Py_BUF_FORMAT
>>>>    The consumer will be using the format string information so make 
>>>> sure that    member is filled correctly. 
>>>
>>> Is the idea to throw an exception if there's some other data format 
>>> besides "b", and this flag isn't set?  It seems superfluous otherwise.
>>
>> The idea is that a consumer may not care about the format and the 
>> exporter may want to know that to simplify the interface.    In other 
>> words the flag is a way for the consumer to communicate that it wants 
>> format information (or not).
>
> I'm -1 on using the flags for this.  It's completely out of character
> compared to the rest of the flags.  All other flags are there for the
> benefit of the consumer; this flag is useless to the consumer.
>
> More concretely, all the rest of the flags are there to tell the 
> exporter what kind of buffer they're prepared to accept.  This flag, 
> alone, does not do that.
I agree. This flag is used by the consumer to state that it wants, will 
be making note of,  and is prepared to deal with a "formatted" buffer.   
I think it's short-sighted to have flags to control providing the other 
members of the PyBuffer structure and not this one.  

Actually,  the "rare" optimization to the exporter can still be 
significant if most consumers don't care about it's format (which 
perhaps it has to construct at request time).

>> If the exporter wants to raise an exception if the format is not
>> requested is up to the exporter.
>
> That seems like a bad idea.  Suppose I have a contiguous numpy array of
> floats and I want to view it as a sequence of bytes.  If the exporter's
> allowed to raise an exception for this, any consumer that wanted a
> data-neutral view of the data would still have to pass Py_BUF_FORMAT to
> guard against this.  Wouldn't that be ironic?

I agree that NumPy would not do this as it would allow un-formatted 
views.  In fact, most exporters would probably choose not to raise an 
error.  But, an exporter that really only wants it's data viewed as 
(e.g. complex numbers) would raise an error to force a consumer to be 
explicit (by providing the Py_BUF_FORMAT flag) about by-passing that 
desire.

>
> Ok, but is the indexing row-major or column-major?  That has to be 
> decided.
I think it's called row-major, but I don't like that term because what 
do you mean for an N-D array? I use 'last-index varies the fastest' if I 
want to be explicity and C-contiguous if we know what we are talking 
about.   Yes this is assumed in such cases.



-Travis



More information about the Python-Dev mailing list