[Python-Dev] Adding bytes.frombuffer() constructor to PEP 467

Thu Jan 5 15:37:18 EST 2017

On Wed, Oct 12, 2016 at 12:08 AM, INADA Naoki <songofacandy at gmail.com>
wrote:
>
> Now I'm sure about bytes.frombuffer() is worth enough.

I would like to revive this thread (taking a liberty to shorten the subject
line.)

The issue of how the bytes(x) constructor should behave when given objects
of various types have come up recently in issue 29159 (Regression in bytes
constructor). [1]

The regression was introduced in issue 27704  (bytes(x) is slow when x is
bytearray) which attempted to speed-up creating bytes and bytearray from
byte-like objects.

I think the core problem is that the bytes(x) constructor tries to be the
Jack of All Trades.  Here is how it is documented in the docstring:

On the other hand, the reference manual while not having this description
in the bytes section, has a similar list in the bytearray section. [3]

"""
The optional source parameter can be used to initialize the array in a few
different ways:

  * If it is a string, you must also give the encoding (and optionally,
errors) parameters; bytearray() then converts the string to bytes using
str.encode().
  * If it is an integer, the array will have that size and will be
initialized with null bytes.
  * If it is an object conforming to the buffer interface, a read-only
buffer of the object will be used to initialize the bytes array.
  * If it is an iterable, it must be an iterable of integers in the range 0
<= x < 256, which are used as the initial contents of the array.

Without an argument, an array of size 0 is created.
"""

Note that the integer case is listed before buffer interface.  Neither
document mentions the possibility that the source type has a __bytes__
method.

This ambiguity between integer-like and buffer-like sources causes a
problem in the case when a 3rd party type is both integer-like and
buffer-like.  This is what happens with numpy arrays:

>>> bytes(numpy.array([2], 'i1'))
b'\x00\x00'

>>> bytes(numpy.array([2, 2], 'i1'))
b'\x02\x02'

For better or worse, single-element numpy arrays have a working __index__
methods

>>> numpy.array([2], 'i1').__index__()
2

and are interpreted as integers by the bytes(X) constructor.

I propose the following:

1. For 3.6, restore and document 3.5 behavior.  Recommend that 3rd party
types that are both integer-like and buffer-like implement their own
__bytes__ method to resolve the bytes(x) ambiguity.

2. For 3.7, I would like to see a drastically simplified bytes(x):
2.1.  Accept only objects with a __bytes__ method or a sequence of ints in
range(256).
2.2.  Expand __bytes__ definition to accept optional encoding and errors
parameters.  Implement str.__bytes__(self, [encoding[, errors]]).
2.3.  Implement new specialized bytes.fromsize and bytes.frombuffer
constructors as per PEP 467 and Inada Naoki proposals.
2.4. Implement memoryview.__bytes__ method so that bytes(memoryview(x))
works ad before.
2.5.  Implement a fast bytearray.__bytes__ method.

3. Consider promoting __bytes__ to a tp_bytes type slot.

[1]: http://bugs.python.org/issue29159
[2]: http://bugs.python.org/issue27704
[3]: https://docs.python.org/3/library/functions.html#bytearray
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20170105/b35edd74/attachment.html>