Re: [Python-Dev] PEP 467: next round

July 19, 2016

      (Thanks for moving this forward, Ethan!)

On 19 July 2016 at 06:17, Ethan Furman <ethan@stoneleaf.us> wrote:
...
* Add ``bytes.getbyte`` and ``bytearray.getbyte`` byte retrieval methods
* Add ``bytes.iterbytes``, ``bytearray.iterbytes`` and
  ``memoryview.iterbytes`` alternative iterators
As a possible alternative to this aspect, what if we adjusted
memorview.cast() to also support the "s" format code from the struct
module?

At the moment, trying to use "s" gives a value error:
...
...
...
bview = memoryview(data).cast("s")
  Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
  ValueError: memoryview: destination format must be a native single
character format prefixed with an optional '@'
However, it could be supported by always interpreting it as equivalent
to "1s", such that the view produced length 1 bytes objects on
indexing and iteration, rather than integers (which is what it does
given the default "b" format).

Given "memoryview(data).cast('s')" as a basic building block, most of
the other aspects of working with bytes objects as if they were Python
2 strings should become relatively straightforward, so the question
would be whether we wanted to make it easy for people to avoid
constructing the mediating memoryview object.
...
Proposals
=========
Deprecation of current "zero-initialised sequence" behaviour without removal
----------------------------------------------------------------------------
Currently, the ``bytes`` and ``bytearray`` constructors accept an integer
argument and interpret it as meaning to create a zero-initialised sequence
of the given size::
>>> bytes(3)
    b'\x00\x00\x00'
    >>> bytearray(3)
    bytearray(b'\x00\x00\x00')
This PEP proposes to deprecate that behaviour in Python 3.6, but to leave
it in place for at least as long as Python 2.7 is supported, possibly
indefinitely.
I'd suggest being more explicit that this would just be a documented
deprecation, rather than a programmatic deprecatation warning.
...
Addition of explicit "count and byte initialised sequence" constructors
-----------------------------------------------------------------------
To replace the deprecated behaviour, this PEP proposes the addition of an
explicit ``size`` alternative constructor as a class method on both
``bytes`` and ``bytearray`` whose first argument is the count, and whose
second argument is the fill byte to use (defaults to ``\x00``)::
>>> bytes.size(3)
    b'\x00\x00\x00'
    >>> bytearray.size(3)
    bytearray(b'\x00\x00\x00')
    >>> bytes.size(5, b'\x0a')
    b'\x0a\x0a\x0a\x0a\x0a'
    >>> bytearray.size(5, b'\x0a')
    bytearray(b'\x0a\x0a\x0a\x0a\x0a')
While I like the notion of having "size" in the name, the
"noun-as-constructor" phrasing doesn't read right to me. Perhaps
"fromsize" for consistency with "fromhex"?
...
It will behave just as the current constructors behave when passed a single
integer.
This last paragraph feels incomplete now, given the expansion to allow
the fill value to be specified.
...
Addition of "bchr" function and explicit "single byte" constructors
-------------------------------------------------------------------
As binary counterparts to the text ``chr`` function, this PEP proposes
the addition of a ``bchr`` function and an explicit ``fromint`` alternative
constructor as a class method on both ``bytes`` and ``bytearray``::
>>> bchr(ord("A"))
    b'A'
    >>> bchr(ord(b"A"))
    b'A'
    >>> bytes.fromint(65)
    b'A'
    >>> bytearray.fromint(65)
    bytearray(b'A')
Since "fromsize" would also accept an int value, "fromint" feels
ambiguous here. Perhaps "fromord" to emphasise the integer is being
interpreted as an ordinal bytes value, rather than as a size?

The apparent "two ways to do it" here also deserves some additional explanation:

- the bchr builtin is to recreate the ord/chr/unichr trio from Python
2 under a different naming scheme
- the class method is mainly for the "bytearray.fromord" case, with
bytes.fromord added for consistency

[snip sections on accessing elements as bytes object]
...
Design discussion
=================
Why not rely on sequence repetition to create zero-initialised sequences?
-------------------------------------------------------------------------
Zero-initialised sequences can be created via sequence repetition::
>>> b'\x00' * 3
    b'\x00\x00\x00'
    >>> bytearray(b'\x00') * 3
    bytearray(b'\x00\x00\x00')
However, this was also the case when the ``bytearray`` type was originally
designed, and the decision was made to add explicit support for it in the
type constructor. The immutable ``bytes`` type then inherited that feature
when it was introduced in PEP 3137.
This PEP isn't revisiting that original design decision, just changing the
spelling as users sometimes find the current behaviour of the binary
sequence
constructors surprising. In particular, there's a reasonable case to be made
that ``bytes(x)`` (where ``x`` is an integer) should behave like the
``bytes.byte(x)`` proposal in this PEP. Providing both behaviours as
separate
class methods avoids that ambiguity.
This note will need some tweaks to match the updated method names in
the proposal.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan@gmail.com   |   Brisbane, Australia