Re: [Python-Dev] Understanding the buffer API

4 Aug 2012

      Jeff Allen  wrote:
...
I'd like to lay a solid foundation that benefits from the
recent CPython work. I hope that some of the complexity in
memoryview stems from legacy considerations I don't have to deal
with in Jython.
I'm afraid not: PEP-3118 is really that complex. ;)
...
My understanding is this: When a consumer requests a buffer from the
exporter it specifies using flags how it intends to navigate it. If
the buffer actually needs more apparatus than the consumer proposes,
this raises an exception. If the buffer needs less apparatus than
the consumer proposes, the exporter has to supply what was asked
for.  For example, if the consumer sets PyBUF_STRIDES, and the
buffer can only be navigated by using suboffsets (PIL-style) this
raises an exception. Alternatively, if the consumer sets
PyBUF_STRIDES, and the buffer is just a simple byte array, the
exporter has to supply shape and strides arrays (with trivial
values), since the consumer is going to use those arrays.
Yes.
...
Is there any harm is supplying shape and strides when they were not
requested? The PEP says: "PyBUF_ND ... If this is not given then
shape will be NULL". It doesn't stipulate that strides will be null
if PyBUF_STRIDES is not given, but the library documentation says
so. suboffsets is different since even when requested, it will be
null if not needed.
You are right that the PEP does not explicitly state that rule for
strides. However, NULL always has an implied meaning:

  format=NULL  ->  treat the buffer as unsigned bytes.

  shape=NULL   ->  one-dimensional AND treat the buffer as unsigned bytes.

  strides=NULL ->  C-contiguous

I think relaxing the NULL rule for strides would complicate things,
since it would introduce yet another special case.
...
Similar, but simpler, the PEP says "PyBUF_FORMAT ... If format is
not explicitly requested then the format must be returned as NULL
(which means "B", or unsigned bytes)". What would be the harm in
returning "B"?
Ah, yes. The key here is this:

"This would be used when the consumer is going to be checking for what
 'kind' of data is actually stored."

Conversely, if not requested, format=NULL indicates that the real
format may be e.g. 'L', but the consumer wants to treat the buffer
as unsigned bytes. This works because the 'len' field stores the
length of the memory area in bytes (for contiguous buffers at least).

The 'itemsize' field may be wrong though in this special case.

In general, format=NULL is a cast of a (possibly multi-dimensional)
C-contiguous buffer to a one-dimensional buffer of unsigned bytes.

IMO only the following combinations make sense. These two are self explanatory:

   1) shape=NULL, format=NULL    ->  e.g. PyBUF_SIMPLE

   2) shape!=NULL, format!=NULL  ->  e.g. PyBUF_FULL

1) can break the invariant product(shape) * itemsize = len!

The next combination exists as part of PyBUF_STRIDED:

   3) shape!=NULL, format=NULL.

It can break two invariants (product(shape) * itemsize = len,
calcsize(format) = itemsize), but since it's explicitly part of
PyBUF_STRIDED, memoryview_getbuf() allows it.

The remaining combination is disallowed, since the buffer is already assumed to
be unsigned bytes:

   4) shape=NULL, format!=NULL.
...
One place where this really matters is in the implementation of
memoryview. PyMemoryView requests a buffer with the flags
PyBUF_FULL_RO, so even a simple byte buffer export will come with
shape, strides and format. A consumer (of the memoryview's buffer
API) might specify PyBUF_SIMPLE: according to the PEP I can't simply
give it the original buffer since required fields (that the consumer
will presumably not access) are not NULL. In practice, I'd like to:
what could possibly go wrong?
Because of all the implied meanings of NULL, I think the safest way is
to implement memoryview_getbuf() for Jython. After all the PEP describes
a protocol, so everyone should really be doing the same thing.

Whether the protocol needs to be that complex is another question.
Partially initialized buffers are a pain to handle on the C level
since it is necessary to reconstruct the missing values -- at least if
you want to keep your sanity :).

I think the protocol would benefit from changing the getbuffer rules to:

   a) The buffer gets a 'flags' field that can store properties like
      PyBUF_SIMPLE, PyBUF_C_CONTIGUOUS etc.

   b) The exporter must *always* provide full information.

   c) If a buffer can be exported as unsigned bytes but has a different
      layout, the exporter must perform a full cast so that the above
      mentioned invariants are kept.

      The disadvantage of this is that the original layout is lost for
      the consumer. I do not know if there is a use case that requires
      the consumer to have the original layout information.

Stefan Krah