[Python-Dev] PEP 467: next round
Nick Coghlan
ncoghlan at gmail.com
Tue Jul 19 00:12:08 EDT 2016
(Thanks for moving this forward, Ethan!)
On 19 July 2016 at 06:17, Ethan Furman <ethan at stoneleaf.us> wrote:
> * Add ``bytes.getbyte`` and ``bytearray.getbyte`` byte retrieval methods
> * Add ``bytes.iterbytes``, ``bytearray.iterbytes`` and
> ``memoryview.iterbytes`` alternative iterators
As a possible alternative to this aspect, what if we adjusted
memorview.cast() to also support the "s" format code from the struct
module?
At the moment, trying to use "s" gives a value error:
>>> bview = memoryview(data).cast("s")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: memoryview: destination format must be a native single
character format prefixed with an optional '@'
However, it could be supported by always interpreting it as equivalent
to "1s", such that the view produced length 1 bytes objects on
indexing and iteration, rather than integers (which is what it does
given the default "b" format).
Given "memoryview(data).cast('s')" as a basic building block, most of
the other aspects of working with bytes objects as if they were Python
2 strings should become relatively straightforward, so the question
would be whether we wanted to make it easy for people to avoid
constructing the mediating memoryview object.
> Proposals
> =========
>
> Deprecation of current "zero-initialised sequence" behaviour without removal
> ----------------------------------------------------------------------------
>
> Currently, the ``bytes`` and ``bytearray`` constructors accept an integer
> argument and interpret it as meaning to create a zero-initialised sequence
> of the given size::
>
> >>> bytes(3)
> b'\x00\x00\x00'
> >>> bytearray(3)
> bytearray(b'\x00\x00\x00')
>
> This PEP proposes to deprecate that behaviour in Python 3.6, but to leave
> it in place for at least as long as Python 2.7 is supported, possibly
> indefinitely.
I'd suggest being more explicit that this would just be a documented
deprecation, rather than a programmatic deprecatation warning.
> Addition of explicit "count and byte initialised sequence" constructors
> -----------------------------------------------------------------------
>
> To replace the deprecated behaviour, this PEP proposes the addition of an
> explicit ``size`` alternative constructor as a class method on both
> ``bytes`` and ``bytearray`` whose first argument is the count, and whose
> second argument is the fill byte to use (defaults to ``\x00``)::
>
> >>> bytes.size(3)
> b'\x00\x00\x00'
> >>> bytearray.size(3)
> bytearray(b'\x00\x00\x00')
> >>> bytes.size(5, b'\x0a')
> b'\x0a\x0a\x0a\x0a\x0a'
> >>> bytearray.size(5, b'\x0a')
> bytearray(b'\x0a\x0a\x0a\x0a\x0a')
While I like the notion of having "size" in the name, the
"noun-as-constructor" phrasing doesn't read right to me. Perhaps
"fromsize" for consistency with "fromhex"?
> It will behave just as the current constructors behave when passed a single
> integer.
This last paragraph feels incomplete now, given the expansion to allow
the fill value to be specified.
> Addition of "bchr" function and explicit "single byte" constructors
> -------------------------------------------------------------------
>
> As binary counterparts to the text ``chr`` function, this PEP proposes
> the addition of a ``bchr`` function and an explicit ``fromint`` alternative
> constructor as a class method on both ``bytes`` and ``bytearray``::
>
> >>> bchr(ord("A"))
> b'A'
> >>> bchr(ord(b"A"))
> b'A'
> >>> bytes.fromint(65)
> b'A'
> >>> bytearray.fromint(65)
> bytearray(b'A')
Since "fromsize" would also accept an int value, "fromint" feels
ambiguous here. Perhaps "fromord" to emphasise the integer is being
interpreted as an ordinal bytes value, rather than as a size?
The apparent "two ways to do it" here also deserves some additional explanation:
- the bchr builtin is to recreate the ord/chr/unichr trio from Python
2 under a different naming scheme
- the class method is mainly for the "bytearray.fromord" case, with
bytes.fromord added for consistency
[snip sections on accessing elements as bytes object]
> Design discussion
> =================
>
> Why not rely on sequence repetition to create zero-initialised sequences?
> -------------------------------------------------------------------------
>
> Zero-initialised sequences can be created via sequence repetition::
>
> >>> b'\x00' * 3
> b'\x00\x00\x00'
> >>> bytearray(b'\x00') * 3
> bytearray(b'\x00\x00\x00')
>
> However, this was also the case when the ``bytearray`` type was originally
> designed, and the decision was made to add explicit support for it in the
> type constructor. The immutable ``bytes`` type then inherited that feature
> when it was introduced in PEP 3137.
>
> This PEP isn't revisiting that original design decision, just changing the
> spelling as users sometimes find the current behaviour of the binary
> sequence
> constructors surprising. In particular, there's a reasonable case to be made
> that ``bytes(x)`` (where ``x`` is an integer) should behave like the
> ``bytes.byte(x)`` proposal in this PEP. Providing both behaviours as
> separate
> class methods avoids that ambiguity.
This note will need some tweaks to match the updated method names in
the proposal.
Cheers,
Nick.
--
Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia
More information about the Python-Dev
mailing list