Re: [Python-Dev] PEP 467: Minor API improvements to bytes, bytearray, and memoryview

10 Jun 2016

      On 9 June 2016 at 19:21, Barry Warsaw  wrote:
...
On Jun 07, 2016, at 01:28 PM, Ethan Furman wrote:
...
Deprecation of current "zero-initialised sequence" behaviour
------------------------------------------------------------
Currently, the ``bytes`` and ``bytearray`` constructors accept an integer
argument and interpret it as meaning to create a zero-initialised sequence of
the given size::
>>> bytes(3)
    b'\x00\x00\x00'
    >>> bytearray(3)
    bytearray(b'\x00\x00\x00')
This PEP proposes to deprecate that behaviour in Python 3.6, and remove it
entirely in Python 3.7.
No other changes are proposed to the existing constructors.
Does it need to be *actually* removed?  That does break existing code for not
a lot of benefit.  Yes, the default constructor is a little wonky, but with
the addition of the new constructors, and the fact that you're not proposing
to eventually change the default constructor, removal seems unnecessary.
Besides, once it's removed, what would `bytes(3)` actually do?  The PEP
doesn't say.
Raise TypeError, presumably. However, I agree this isn't worth the
hassle of breaking working code, especially since truly ludicrous
values will fail promptly with MemoryError - it's only a particular
range of values that fit within the limits of the machine, but also
push it into heavy swapping that are a potential problem.
...
Also, since you're proposing to add `bytes.byte(3)` have you considered also
adding an optional count argument?  E.g. `bytes.byte(3, count=7)` would yield
b'\x03\x03\x03\x03\x03\x03\x03'.  That seems like it could be useful.
The purpose of bytes.byte() in the PEP is to provide a way to
roundtrip ord() calls with binary inputs, since the current spelling
is pretty unintuitive:

    >>> ord("A")
    65
    >>> chr(ord("A"))
    'A'
    >>> ord(b"A")
    65
    >>> bytes([ord(b"A")])
    b'A'

That said, perhaps it would make more sense for the corresponding
round-trip to be:

    >>> bchr(ord("A"))
    b'A'

With the "b" prefix on "chr" reflecting the "b" prefix on the output.
This also inverts the chr/unichr pairing that existed in Python 2
(replacing it with bchr/chr), and is hence very friendly to
compatibility modules like six and future (future.builtins already
provides a chr that behaves like the Python 3 one, and bchr would be
much easier to add to that than a new bytes object method).

In terms of an efficient memory-preallocation interface, the
equivalent NumPy operation to request a pre-filled array is
"ndarray.full":
http://docs.scipy.org/doc/numpy-1.10.1/reference/generated/numpy.full.html
(there's also an inplace mutation operation, "fill")

For bytes and bytearray though, that has an unfortunate name collision
with "zfill", which refers to zero-padding numeric values for fixed
width display.

If the PEP just added bchr() to complement chr(), and [bytes,
bytearray].zeros() as a more discoverable alternative to passing
integers to the default constructor, I think that would be a decent
step forward, and the question of pre-initialising with arbitrary
values can be deferred for now (and perhaps left to NumPy
indefinitely)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan@gmail.com   |   Brisbane, Australia