Jeff Allen <ja...py@farowl.co.uk> wrote:
I'd like to lay a solid foundation that benefits from the recent CPython work. I hope that some of the complexity in memoryview stems from legacy considerations I don't have to deal with in Jython.
I'm afraid not: PEP-3118 is really that complex. ;)
My understanding is this: When a consumer requests a buffer from the exporter it specifies using flags how it intends to navigate it. If the buffer actually needs more apparatus than the consumer proposes, this raises an exception. If the buffer needs less apparatus than the consumer proposes, the exporter has to supply what was asked for. For example, if the consumer sets PyBUF_STRIDES, and the buffer can only be navigated by using suboffsets (PIL-style) this raises an exception. Alternatively, if the consumer sets PyBUF_STRIDES, and the buffer is just a simple byte array, the exporter has to supply shape and strides arrays (with trivial values), since the consumer is going to use those arrays.
Yes.
Is there any harm is supplying shape and strides when they were not requested? The PEP says: "PyBUF_ND ... If this is not given then shape will be NULL". It doesn't stipulate that strides will be null if PyBUF_STRIDES is not given, but the library documentation says so. suboffsets is different since even when requested, it will be null if not needed.
You are right that the PEP does not explicitly state that rule for strides. However, NULL always has an implied meaning: format=NULL -> treat the buffer as unsigned bytes. shape=NULL -> one-dimensional AND treat the buffer as unsigned bytes. strides=NULL -> C-contiguous I think relaxing the NULL rule for strides would complicate things, since it would introduce yet another special case.
Similar, but simpler, the PEP says "PyBUF_FORMAT ... If format is not explicitly requested then the format must be returned as NULL (which means "B", or unsigned bytes)". What would be the harm in returning "B"?
Ah, yes. The key here is this: "This would be used when the consumer is going to be checking for what 'kind' of data is actually stored." Conversely, if not requested, format=NULL indicates that the real format may be e.g. 'L', but the consumer wants to treat the buffer as unsigned bytes. This works because the 'len' field stores the length of the memory area in bytes (for contiguous buffers at least). The 'itemsize' field may be wrong though in this special case. In general, format=NULL is a cast of a (possibly multi-dimensional) C-contiguous buffer to a one-dimensional buffer of unsigned bytes. IMO only the following combinations make sense. These two are self explanatory: 1) shape=NULL, format=NULL -> e.g. PyBUF_SIMPLE 2) shape!=NULL, format!=NULL -> e.g. PyBUF_FULL 1) can break the invariant product(shape) * itemsize = len! The next combination exists as part of PyBUF_STRIDED: 3) shape!=NULL, format=NULL. It can break two invariants (product(shape) * itemsize = len, calcsize(format) = itemsize), but since it's explicitly part of PyBUF_STRIDED, memoryview_getbuf() allows it. The remaining combination is disallowed, since the buffer is already assumed to be unsigned bytes: 4) shape=NULL, format!=NULL.
One place where this really matters is in the implementation of memoryview. PyMemoryView requests a buffer with the flags PyBUF_FULL_RO, so even a simple byte buffer export will come with shape, strides and format. A consumer (of the memoryview's buffer API) might specify PyBUF_SIMPLE: according to the PEP I can't simply give it the original buffer since required fields (that the consumer will presumably not access) are not NULL. In practice, I'd like to: what could possibly go wrong?
Because of all the implied meanings of NULL, I think the safest way is to implement memoryview_getbuf() for Jython. After all the PEP describes a protocol, so everyone should really be doing the same thing. Whether the protocol needs to be that complex is another question. Partially initialized buffers are a pain to handle on the C level since it is necessary to reconstruct the missing values -- at least if you want to keep your sanity :). I think the protocol would benefit from changing the getbuffer rules to: a) The buffer gets a 'flags' field that can store properties like PyBUF_SIMPLE, PyBUF_C_CONTIGUOUS etc. b) The exporter must *always* provide full information. c) If a buffer can be exported as unsigned bytes but has a different layout, the exporter must perform a full cast so that the above mentioned invariants are kept. The disadvantage of this is that the original layout is lost for the consumer. I do not know if there is a use case that requires the consumer to have the original layout information. Stefan Krah