[Python-Dev] Adding bytes.frombuffer() constructor to PEP 467

Alexander Belopolsky alexander.belopolsky at gmail.com
Fri Jan 6 14:31:18 EST 2017


On Thu, Jan 5, 2017 at 5:54 PM, Serhiy Storchaka <storchaka at gmail.com>
wrote:

> On 05.01.17 22:37, Alexander Belopolsky wrote:
>
>> I propose the following:
>>
>> 1. For 3.6, restore and document 3.5 behavior.  Recommend that 3rd party
>> types that are both integer-like and buffer-like implement their own
>> __bytes__ method to resolve the bytes(x) ambiguity.
>>
>
> The __bytes__ method is used only by the bytes constructor, not by the
> bytearray constructor.


I am not sure this is deliberate.  See <
https://bugs.python.org/issue2415#msg71660>.

>
>
> 2. For 3.7, I would like to see a drastically simplified bytes(x):
>> 2.1.  Accept only objects with a __bytes__ method or a sequence of ints
>> in range(256).
>> 2.2.  Expand __bytes__ definition to accept optional encoding and errors
>> parameters.  Implement str.__bytes__(self, [encoding[, errors]]).
>>
>
> I think it is better to use the encode() method if you want to encode from
> non-strings.


Possibly, but the goal of my proposal is to lighten the logic in the
bytes(x, [encoding[, errors]])
constructor.  If it detects x.__bytes__, it should just call it with
whatever arguments are given.

>
>
> 2.3.  Implement new specialized bytes.fromsize and bytes.frombuffer
>> constructors as per PEP 467 and Inada Naoki proposals.
>>
>
> bytes.fromsize(n) is just b'\0'*n. I don't think this method is needed.
>

I don't care much about this.  If it helps removing bytes(int) case, I am
for it, otherwise ±0.


>
> bytes.frombuffer(x) is bytes(memoryview(x)) or memoryview(x).tobytes().


I've just tried Inada's patch < http://bugs.python.org/issue29178
<http://bugs.python.org/issue29178>>:

$ ./python.exe -m timeit -s "from array import array; x=array('f', [0])"
"bytes.frombuffer(x)"
2000000 loops, best of 5: 134 nsec per loop

$ ./python.exe -m timeit -s "from array import array; x=array('f', [0])"
"with memoryview(x) as m: bytes(m)"
500000 loops, best of 5: 436 nsec per loop

A 3x speed-up seems to be worth it.


>
>
> 2.4. Implement memoryview.__bytes__ method so that bytes(memoryview(x))
>> works ad before.
>> 2.5.  Implement a fast bytearray.__bytes__ method.
>>
>
> This wouldn't help for the bytearray constructor. And wouldn't allow to
> avoid double copying in the constructor of bytes subclass.


I don't see why bytearray constructor should behave differently from bytes.


>
> 3. Consider promoting __bytes__ to a tp_bytes type slot.
>>
>
> The buffer protocol is more general than the __bytes__ method. It allows
> to avoid redundant memory copying in constructors of many types (bytes,
> bytearray, array.array, etc), not just bytes.
>

It looks like there are two different views on what the bytes type
represents.  Is it a sequence of small integers or a blob of binary data?

Compare these two calls:

>>> from array import array
>>> bytes(array('h', [1, 2, 3]))
b'\x01\x00\x02\x00\x03\x00'

and

>>> bytes(array('f', [1, 2, 3]))
b'\x00\x00\x80?\x00\x00\x00@\x00\x00@@'

For me the __bytes__ method is a way for types to specify their bytes
representation that may or may not be the same as memoryview(x).tobytes().
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20170106/51c4caa0/attachment.html>


More information about the Python-Dev mailing list