[Python-Dev] Understanding the buffer API

Stefan Krah stefan at bytereef.org
Sat Aug 4 11:11:50 CEST 2012


Jeff Allen <ja...py at farowl.co.uk> wrote:
> I'd like to lay a solid foundation that benefits from the
> recent CPython work. I hope that some of the complexity in
> memoryview stems from legacy considerations I don't have to deal
> with in Jython.

I'm afraid not: PEP-3118 is really that complex. ;)


> My understanding is this: When a consumer requests a buffer from the
> exporter it specifies using flags how it intends to navigate it. If
> the buffer actually needs more apparatus than the consumer proposes,
> this raises an exception. If the buffer needs less apparatus than
> the consumer proposes, the exporter has to supply what was asked
> for.  For example, if the consumer sets PyBUF_STRIDES, and the
> buffer can only be navigated by using suboffsets (PIL-style) this
> raises an exception. Alternatively, if the consumer sets
> PyBUF_STRIDES, and the buffer is just a simple byte array, the
> exporter has to supply shape and strides arrays (with trivial
> values), since the consumer is going to use those arrays.

Yes.


> Is there any harm is supplying shape and strides when they were not
> requested? The PEP says: "PyBUF_ND ... If this is not given then
> shape will be NULL". It doesn't stipulate that strides will be null
> if PyBUF_STRIDES is not given, but the library documentation says
> so. suboffsets is different since even when requested, it will be
> null if not needed.

You are right that the PEP does not explicitly state that rule for
strides. However, NULL always has an implied meaning:

  format=NULL  ->  treat the buffer as unsigned bytes.

  shape=NULL   ->  one-dimensional AND treat the buffer as unsigned bytes.

  strides=NULL ->  C-contiguous


I think relaxing the NULL rule for strides would complicate things,
since it would introduce yet another special case.


> Similar, but simpler, the PEP says "PyBUF_FORMAT ... If format is
> not explicitly requested then the format must be returned as NULL
> (which means "B", or unsigned bytes)". What would be the harm in
> returning "B"?

Ah, yes. The key here is this:

"This would be used when the consumer is going to be checking for what
 'kind' of data is actually stored."


Conversely, if not requested, format=NULL indicates that the real
format may be e.g. 'L', but the consumer wants to treat the buffer
as unsigned bytes. This works because the 'len' field stores the
length of the memory area in bytes (for contiguous buffers at least).

The 'itemsize' field may be wrong though in this special case.

In general, format=NULL is a cast of a (possibly multi-dimensional)
C-contiguous buffer to a one-dimensional buffer of unsigned bytes.


IMO only the following combinations make sense. These two are self explanatory:

   1) shape=NULL, format=NULL    ->  e.g. PyBUF_SIMPLE

   2) shape!=NULL, format!=NULL  ->  e.g. PyBUF_FULL


1) can break the invariant product(shape) * itemsize = len!


The next combination exists as part of PyBUF_STRIDED:

   3) shape!=NULL, format=NULL.

It can break two invariants (product(shape) * itemsize = len,
calcsize(format) = itemsize), but since it's explicitly part of
PyBUF_STRIDED, memoryview_getbuf() allows it.


The remaining combination is disallowed, since the buffer is already assumed to
be unsigned bytes:

   4) shape=NULL, format!=NULL. 



> One place where this really matters is in the implementation of
> memoryview. PyMemoryView requests a buffer with the flags
> PyBUF_FULL_RO, so even a simple byte buffer export will come with
> shape, strides and format. A consumer (of the memoryview's buffer
> API) might specify PyBUF_SIMPLE: according to the PEP I can't simply
> give it the original buffer since required fields (that the consumer
> will presumably not access) are not NULL. In practice, I'd like to:
> what could possibly go wrong?

Because of all the implied meanings of NULL, I think the safest way is
to implement memoryview_getbuf() for Jython. After all the PEP describes
a protocol, so everyone should really be doing the same thing.


Whether the protocol needs to be that complex is another question.
Partially initialized buffers are a pain to handle on the C level
since it is necessary to reconstruct the missing values -- at least if
you want to keep your sanity :).


I think the protocol would benefit from changing the getbuffer rules to:

   a) The buffer gets a 'flags' field that can store properties like
      PyBUF_SIMPLE, PyBUF_C_CONTIGUOUS etc.

   b) The exporter must *always* provide full information.

   c) If a buffer can be exported as unsigned bytes but has a different
      layout, the exporter must perform a full cast so that the above
      mentioned invariants are kept.

      The disadvantage of this is that the original layout is lost for
      the consumer. I do not know if there is a use case that requires
      the consumer to have the original layout information.



Stefan Krah




More information about the Python-Dev mailing list