memoryview: "B", "c", "b" format specifiers

Hello, during my work on PEP-3118 fixes I noticed that memoryview does not handle the "B" format specifier according to the struct module documentation: Here's what struct does:
Here's what memoryview does:
So, memoryview does exactly the opposite of what is specified. It should reject the bytes object but accept the integer. I would like to fix this in the features/pep-3118 repository as follows: - memoryview should respect the format specifiers. - bytearray and friends should set the format specifier to "c" in their getbuffer() methods. - Introduce a new function PyMemoryView_FromBytes() that can be used instead of PyMemoryView_FromBuffer(). PyMemoryView_FromBuffer() is usually used in conjunction with PyBuffer_FillInfo(), which sets the format specifier to "B". Are there any general objections to this? Stefan Krah

On Thu, 18 Aug 2011 18:22:54 +0200 Stefan Krah <stefan@bytereef.org> wrote:
So, memoryview does exactly the opposite of what is specified. It should reject the bytes object but accept the integer.
Well, memoryview is quite dumb right now. It ignores the format and just considers its underlying memory a bytes sequence.
What would PyMemoryView_FromBytes() do? The name suggests it takes a bytes object, but you can already use PyMemoryView_FromObject() for that. (I personnaly think the general bytes-as-sequence-of-ints behaviour is a mistake, so I wouldn't care much about an additional C API to enforce that behaviour :-)) Regards Antoine.

Antoine Pitrou <solipsis@pitrou.net> wrote:
Oh no, the name isn't quite right then. It should be a replacement for the combination PyBuffer_FillInfo()/PyMemoryView_FromBuffer() and it should temporarily wrap a C-string. Also, unlike that combination, it would set the format specifier to "c". Perhaps this name is better: PyObject * PyMemoryView_FromCString(char *s, Py_ssize_t size, int flags); 'flags' is just PyBUF_READ or PyBUF_WRITE. In the Python source tree, it could completely replace PyBuffer_FillInfo() and PyMemoryView_FromBuffer(). Stefan Krah

On Thu, 18 Aug 2011 18:57:00 +0200 Stefan Krah <stefan@bytereef.org> wrote:
Ah, nice.
PyObject * PyMemoryView_FromCString(char *s, Py_ssize_t size, int flags);
It's not really a C string, since it's not null-terminated. PyMemoryView_FromMemory? (that would mirror PyUnicode_FromUnicode, for example)
'flags' is just PyBUF_READ or PyBUF_WRITE.
Why do we have these in addition to PyBUF_WRITABLE already? Regards Antoine.

Antoine Pitrou <solipsis@pitrou.net> wrote:
I see, yes. PyMemoryView_FromStringAndSize()? No, too much typing. I prefer PyMemoryView_FromMemory().
'flags' is just PyBUF_READ or PyBUF_WRITE.
Why do we have these in addition to PyBUF_WRITABLE already?
That's a bit involved, this is how I see it: There are four buffer *request* flags that can be sent to a buffer provider and that indicate the amount of complexity that a consumer can handle (in decreasing order): PyBUF_INDIRECT -> suboffsets (PIL-style) PyBUF_STRIDES -> strides (Numpy-style) PyBUF_ND -> C-contiguous, but possibly multi-dimensional PyBUF_SIMPLE -> contiguous, one-dimensional, unsigned bytes Each of those flags can be mixed freely with two additional flags: PyBUF_WRITABLE PyBUF_FORMAT All other buffer request flags are simply combinations of those. For example, if you use PyBUF_WRITABLE as the only flag, logically it should be seen as PyBUF_WRITABLE|PyBUF_SIMPLE (this works since PyBUF_SIMPLE is defined as 0). PyBUF_READ and PyBUF_WRITE are so far only used for PyMemoryView_GetContiguous(). The PEP still has a flag named PyBUF_UPDATEIFCOPY, but that didn't make it into object.h. I thought it might be appropriate to use PyBUF_READ and PyBUF_WRITE to underline the fact that you cannot send a fine grained buffer request to PyMemoryView_FromMemory()[1]. Also, PyBUF_READ is easier to understand than PyBUF_SIMPLE. But I'd be equally happy with PyBUF_SIMPLE/PyBUF_WRITABLE. Stefan Krah [1] The terminology might sound funny, but there is a function that can act a micro buffer provider: int PyBuffer_FillInfo(Py_buffer *view, PyObject *obj, void *buf, Py_ssize_t len, int readonly, int infoflags) An exporter can use this function as a building block for a getbuffer() method for unsigned bytes, since it reacts correctly to *all* possible buffer requests in 'infoflags'.

Antoine Pitrou <solipsis@pitrou.net> wrote:
I don't want to abolish the "c" (bytes of length 1) format. :) I think there are use cases for well defined arrays of small signed/unsigned integers. Say you want to send a log-ngram array of unsigned chars over the network. There shouldn't be a bytes object involved in that process. You would pack the array with ints and unpack as ints. Unless the struct module and PEP-3118 grow support for int8_t and uint8_t, I think "b" and "B" should probably be restricted to integers. Stefan Krah

On Thu, 18 Aug 2011 18:22:54 +0200 Stefan Krah <stefan@bytereef.org> wrote:
So, memoryview does exactly the opposite of what is specified. It should reject the bytes object but accept the integer.
Well, memoryview is quite dumb right now. It ignores the format and just considers its underlying memory a bytes sequence.
What would PyMemoryView_FromBytes() do? The name suggests it takes a bytes object, but you can already use PyMemoryView_FromObject() for that. (I personnaly think the general bytes-as-sequence-of-ints behaviour is a mistake, so I wouldn't care much about an additional C API to enforce that behaviour :-)) Regards Antoine.

Antoine Pitrou <solipsis@pitrou.net> wrote:
Oh no, the name isn't quite right then. It should be a replacement for the combination PyBuffer_FillInfo()/PyMemoryView_FromBuffer() and it should temporarily wrap a C-string. Also, unlike that combination, it would set the format specifier to "c". Perhaps this name is better: PyObject * PyMemoryView_FromCString(char *s, Py_ssize_t size, int flags); 'flags' is just PyBUF_READ or PyBUF_WRITE. In the Python source tree, it could completely replace PyBuffer_FillInfo() and PyMemoryView_FromBuffer(). Stefan Krah

On Thu, 18 Aug 2011 18:57:00 +0200 Stefan Krah <stefan@bytereef.org> wrote:
Ah, nice.
PyObject * PyMemoryView_FromCString(char *s, Py_ssize_t size, int flags);
It's not really a C string, since it's not null-terminated. PyMemoryView_FromMemory? (that would mirror PyUnicode_FromUnicode, for example)
'flags' is just PyBUF_READ or PyBUF_WRITE.
Why do we have these in addition to PyBUF_WRITABLE already? Regards Antoine.

Antoine Pitrou <solipsis@pitrou.net> wrote:
I see, yes. PyMemoryView_FromStringAndSize()? No, too much typing. I prefer PyMemoryView_FromMemory().
'flags' is just PyBUF_READ or PyBUF_WRITE.
Why do we have these in addition to PyBUF_WRITABLE already?
That's a bit involved, this is how I see it: There are four buffer *request* flags that can be sent to a buffer provider and that indicate the amount of complexity that a consumer can handle (in decreasing order): PyBUF_INDIRECT -> suboffsets (PIL-style) PyBUF_STRIDES -> strides (Numpy-style) PyBUF_ND -> C-contiguous, but possibly multi-dimensional PyBUF_SIMPLE -> contiguous, one-dimensional, unsigned bytes Each of those flags can be mixed freely with two additional flags: PyBUF_WRITABLE PyBUF_FORMAT All other buffer request flags are simply combinations of those. For example, if you use PyBUF_WRITABLE as the only flag, logically it should be seen as PyBUF_WRITABLE|PyBUF_SIMPLE (this works since PyBUF_SIMPLE is defined as 0). PyBUF_READ and PyBUF_WRITE are so far only used for PyMemoryView_GetContiguous(). The PEP still has a flag named PyBUF_UPDATEIFCOPY, but that didn't make it into object.h. I thought it might be appropriate to use PyBUF_READ and PyBUF_WRITE to underline the fact that you cannot send a fine grained buffer request to PyMemoryView_FromMemory()[1]. Also, PyBUF_READ is easier to understand than PyBUF_SIMPLE. But I'd be equally happy with PyBUF_SIMPLE/PyBUF_WRITABLE. Stefan Krah [1] The terminology might sound funny, but there is a function that can act a micro buffer provider: int PyBuffer_FillInfo(Py_buffer *view, PyObject *obj, void *buf, Py_ssize_t len, int readonly, int infoflags) An exporter can use this function as a building block for a getbuffer() method for unsigned bytes, since it reacts correctly to *all* possible buffer requests in 'infoflags'.

Antoine Pitrou <solipsis@pitrou.net> wrote:
I don't want to abolish the "c" (bytes of length 1) format. :) I think there are use cases for well defined arrays of small signed/unsigned integers. Say you want to send a log-ngram array of unsigned chars over the network. There shouldn't be a bytes object involved in that process. You would pack the array with ints and unpack as ints. Unless the struct module and PEP-3118 grow support for int8_t and uint8_t, I think "b" and "B" should probably be restricted to integers. Stefan Krah
participants (2)
-
Antoine Pitrou
-
Stefan Krah