[Python-Dev] PEP 467: next round

Tue Jul 19 00:12:08 EDT 2016

(Thanks for moving this forward, Ethan!)

On 19 July 2016 at 06:17, Ethan Furman <ethan at stoneleaf.us> wrote:
> * Add ``bytes.getbyte`` and ``bytearray.getbyte`` byte retrieval methods
> * Add ``bytes.iterbytes``, ``bytearray.iterbytes`` and
>   ``memoryview.iterbytes`` alternative iterators

As a possible alternative to this aspect, what if we adjusted
memorview.cast() to also support the "s" format code from the struct
module?

At the moment, trying to use "s" gives a value error:

  >>> bview = memoryview(data).cast("s")
  Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
  ValueError: memoryview: destination format must be a native single
character format prefixed with an optional '@'

However, it could be supported by always interpreting it as equivalent
to "1s", such that the view produced length 1 bytes objects on
indexing and iteration, rather than integers (which is what it does
given the default "b" format).

Given "memoryview(data).cast('s')" as a basic building block, most of
the other aspects of working with bytes objects as if they were Python
2 strings should become relatively straightforward, so the question
would be whether we wanted to make it easy for people to avoid
constructing the mediating memoryview object.

> Proposals
> =========
>
> Deprecation of current "zero-initialised sequence" behaviour without removal
> ----------------------------------------------------------------------------
>
> Currently, the ``bytes`` and ``bytearray`` constructors accept an integer
> argument and interpret it as meaning to create a zero-initialised sequence
> of the given size::
>
>     >>> bytes(3)
>     b'\x00\x00\x00'
>     >>> bytearray(3)
>     bytearray(b'\x00\x00\x00')
>
> This PEP proposes to deprecate that behaviour in Python 3.6, but to leave
> it in place for at least as long as Python 2.7 is supported, possibly
> indefinitely.

I'd suggest being more explicit that this would just be a documented
deprecation, rather than a programmatic deprecatation warning.

> Addition of explicit "count and byte initialised sequence" constructors
> -----------------------------------------------------------------------
>
> To replace the deprecated behaviour, this PEP proposes the addition of an
> explicit ``size`` alternative constructor as a class method on both
> ``bytes`` and ``bytearray`` whose first argument is the count, and whose
> second argument is the fill byte to use (defaults to ``\x00``)::
>
>     >>> bytes.size(3)
>     b'\x00\x00\x00'
>     >>> bytearray.size(3)
>     bytearray(b'\x00\x00\x00')
>     >>> bytes.size(5, b'\x0a')
>     b'\x0a\x0a\x0a\x0a\x0a'
>     >>> bytearray.size(5, b'\x0a')
>     bytearray(b'\x0a\x0a\x0a\x0a\x0a')

While I like the notion of having "size" in the name, the
"noun-as-constructor" phrasing doesn't read right to me. Perhaps
"fromsize" for consistency with "fromhex"?

> It will behave just as the current constructors behave when passed a single
> integer.

This last paragraph feels incomplete now, given the expansion to allow
the fill value to be specified.

> Addition of "bchr" function and explicit "single byte" constructors
> -------------------------------------------------------------------
>
> As binary counterparts to the text ``chr`` function, this PEP proposes
> the addition of a ``bchr`` function and an explicit ``fromint`` alternative
> constructor as a class method on both ``bytes`` and ``bytearray``::
>
>     >>> bchr(ord("A"))
>     b'A'
>     >>> bchr(ord(b"A"))
>     b'A'
>     >>> bytes.fromint(65)
>     b'A'
>     >>> bytearray.fromint(65)
>     bytearray(b'A')

Since "fromsize" would also accept an int value, "fromint" feels
ambiguous here. Perhaps "fromord" to emphasise the integer is being
interpreted as an ordinal bytes value, rather than as a size?

The apparent "two ways to do it" here also deserves some additional explanation:

- the bchr builtin is to recreate the ord/chr/unichr trio from Python
2 under a different naming scheme
- the class method is mainly for the "bytearray.fromord" case, with
bytes.fromord added for consistency

[snip sections on accessing elements as bytes object]

> Design discussion
> =================
>
> Why not rely on sequence repetition to create zero-initialised sequences?
> -------------------------------------------------------------------------
>
> Zero-initialised sequences can be created via sequence repetition::
>
>     >>> b'\x00' * 3
>     b'\x00\x00\x00'
>     >>> bytearray(b'\x00') * 3
>     bytearray(b'\x00\x00\x00')
>
> However, this was also the case when the ``bytearray`` type was originally
> designed, and the decision was made to add explicit support for it in the
> type constructor. The immutable ``bytes`` type then inherited that feature
> when it was introduced in PEP 3137.
>
> This PEP isn't revisiting that original design decision, just changing the
> spelling as users sometimes find the current behaviour of the binary
> sequence
> constructors surprising. In particular, there's a reasonable case to be made
> that ``bytes(x)`` (where ``x`` is an integer) should behave like the
> ``bytes.byte(x)`` proposal in this PEP. Providing both behaviours as
> separate
> class methods avoids that ambiguity.

This note will need some tweaks to match the updated method names in
the proposal.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia