[New-bugs-announce] [issue23756] Tighten definition of bytes-like objects

Martin Panter report at bugs.python.org
Tue Mar 24 09:25:17 CET 2015

New submission from Martin Panter:

There are moves at documenting and implementing support for “bytes-like” objects in more APIs, such as the “io” module (Issue 20699), http.client (Issue 23740). The glossary definition is currently “An object that supports the Buffer Protocol, like bytes, bytearray or memoryview.” This was originally added for Issue 16518. However after reading Issue 23688, I realized that it should probably not mean absolutely _any_ object supporting the buffer protocol. For instance:

>>> reverse_view = memoryview(b"123")[::-1]
>>> stdout.buffer.write(reverse_view)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
BufferError: memoryview: underlying buffer is not C-contiguous

I think the definition should at least be tightened to only objects with a contiguous buffer, and “contiguous” should be defined (probably in the linked C API page or the memoryview.contiguous flag definition, not the glossary). So far, my understanding is these are contiguous:

* A zero-dimensional object, such as a ctypes object
* An multi-dimensional array with items stored contiguously in order of increasing indexes. I.e. a_2,2 is stored somewhere after both a_1,2 and a_2,1, and all the strides are positive.

and these are not contiguous:

* memoryview(contiguous)[::2], because there are memory gaps between the items
* memoryview(contiguous)[::-1], despite there being no gaps nor overlapping items
* Views that set the “suboffsets” field (i.e. include pointers to further memory)
* Views where different array items overlap each other (e.g. 0 in view.strides)

Perhaps the bytes-like definition should tightened further, to match the above error message, to only “C-contiguous” buffers. I understand that C-contiguous means the strides tuple has to be in non-strict decreasing order, e.g. for 2 × 1 × 3 arrays, strides == (3, 3, 1) is C-contiguous, but strides == (1, 3, 3) is not. This also needs documenting.

I’m not so sure about these, but the definition could be tightened even further:

* Require memoryview(x).cast("B") to be supported. Otherwise, native Python code would have to use workarounds like struct.pack_into() to write to the “bytes-like” object. See Issue 15944.
* Require len(view) == view.nbytes. This would help in some cases avoid the bug that I have seen of code naively calling len(data), but the downside is ctypes objects would no longer be considered bytes-like objects.

assignee: docs at python
components: Documentation
messages: 239097
nosy: docs at python, vadmium
priority: normal
severity: normal
status: open
title: Tighten definition of bytes-like objects

Python tracker <report at bugs.python.org>

More information about the New-bugs-announce mailing list