[Python-Dev] PEP 461: Adding % formatting to bytes and bytearray -- Request for Pronouncement

Guido van Rossum guido at python.org
Thu Mar 27 21:44:10 CET 2014


Accepted. Congrats with marshalling yet another quite contentious
discussion, and putting up with my last-minute block-headedness!

If you're going to commit another change, may I suggest to add, to the
section stating that %r is not supported, that %a is usually a suitable
replacement for %r?


On Thu, Mar 27, 2014 at 1:07 PM, Ethan Furman <ethan at stoneleaf.us> wrote:

> Requesting pronouncement on PEP 461.  Full text below.
>
> ============================================================
> ===================
> PEP: 461
> Title: Adding % formatting to bytes and bytearray
> Version: $Revision$
> Last-Modified: $Date$
> Author: Ethan Furman <ethan at stoneleaf.us>
> Status: Draft
> Type: Standards Track
> Content-Type: text/x-rst
> Created: 2014-01-13
> Python-Version: 3.5
> Post-History: 2014-01-14, 2014-01-15, 2014-01-17, 2014-02-22, 2014-03-25,
>               2014-03-27
> Resolution:
>
>
> Abstract
> ========
>
> This PEP proposes adding % formatting operations similar to Python 2's
> ``str``
> type to ``bytes`` and ``bytearray`` [1]_ [2]_.
>
>
> Rationale
> =========
>
> While interpolation is usually thought of as a string operation, there are
> cases where interpolation on ``bytes`` or ``bytearrays`` make sense, and
> the
> work needed to make up for this missing functionality detracts from the
> overall
> readability of the code.
>
>
> Motivation
> ==========
>
> With Python 3 and the split between ``str`` and ``bytes``, one small but
> important area of programming became slightly more difficult, and much more
> painful -- wire format protocols [3]_.
>
> This area of programming is characterized by a mixture of binary data and
> ASCII compatible segments of text (aka ASCII-encoded text).  Bringing back
> a
> restricted %-interpolation for ``bytes`` and ``bytearray`` will aid both in
> writing new wire format code, and in porting Python 2 wire format code.
>
> Common use-cases include ``dbf`` and ``pdf`` file formats, ``email``
> formats, and ``FTP`` and ``HTTP`` communications, among many others.
>
>
> Proposed semantics for ``bytes`` and ``bytearray`` formatting
> =============================================================
>
> %-interpolation
> ---------------
>
> All the numeric formatting codes (``d``, ``i``, ``o``, ``u``, ``x``, ``X``,
> ``e``, ``E``, ``f``, ``F``, ``g``, ``G``, and any that are subsequently
> added
> to Python 3) will be supported, and will work as they do for str, including
> the padding, justification and other related modifiers (currently ``#``,
> ``0``,
> ``-``, `` `` (space), and ``+`` (plus any added to Python 3)).  The only
> non-numeric codes allowed are ``c``, ``b``, ``a``, and ``s`` (which is a
> synonym for b).
>
> For the numeric codes, the only difference between ``str`` and ``bytes``
> (or
> ``bytearray``) interpolation is that the results from these codes will be
> ASCII-encoded text, not unicode.  In other words, for any numeric
> formatting
> code `%x`::
>
>    b"%x" % val
>
> is equivalent to::
>
>    ("%x" % val).encode("ascii")
>
> Examples::
>
>    >>> b'%4x' % 10
>    b'   a'
>
>    >>> b'%#4x' % 10
>    ' 0xa'
>
>    >>> b'%04X' % 10
>    '000A'
>
> ``%c`` will insert a single byte, either from an ``int`` in range(256), or
> from
> a ``bytes`` argument of length 1, not from a ``str``.
>
> Examples::
>
>     >>> b'%c' % 48
>     b'0'
>
>     >>> b'%c' % b'a'
>     b'a'
>
> ``%b`` will insert a series of bytes.  These bytes are collected in one of
> two
> ways:
>
>   - input type supports ``Py_buffer`` [4]_?
>     use it to collect the necessary bytes
>
>   - input type is something else?
>     use its ``__bytes__`` method [5]_ ; if there isn't one, raise a
> ``TypeError``
>
> In particular, ``%b`` will not accept numbers nor ``str``.  ``str`` is
> rejected
> as the string to bytes conversion requires an encoding, and we are
> refusing to
> guess; numbers are rejected because:
>
>   - what makes a number is fuzzy (float? Decimal? Fraction? some user
> type?)
>
>   - allowing numbers would lead to ambiguity between numbers and textual
>     representations of numbers (3.14 vs '3.14')
>
>   - given the nature of wire formats, explicit is definitely better than
> implicit
>
> ``%s`` is included as a synonym for ``%b`` for the sole purpose of making
> 2/3 code
> bases easier to maintain.  Python 3 only code should use ``%b``.
>
> Examples::
>
>     >>> b'%b' % b'abc'
>     b'abc'
>
>     >>> b'%b' % 'some string'.encode('utf8')
>     b'some string'
>
>     >>> b'%b' % 3.14
>     Traceback (most recent call last):
>     ...
>     TypeError: b'%b' does not accept 'float'
>
>     >>> b'%b' % 'hello world!'
>     Traceback (most recent call last):
>     ...
>     TypeError: b'%b' does not accept 'str'
>
>
> ``%a`` will give the equivalent of
> ``repr(some_obj).encode('ascii', 'backslashreplace')`` on the interpolated
> value.  Use cases include developing a new protocol and writing landmarks
> into the stream; debugging data going into an existing protocol to see if
> the problem is the protocol itself or bad data; a fall-back for a
> serialization
> format; or any situation where defining ``__bytes__`` would not be
> appropriate
> but a readable/informative representation is needed [6]_.
>
> Examples::
>
>     >>> b'%a' % 3.14
>     b'3.14'
>
>     >>> b'%a' % b'abc'
>     b"b'abc'"
>
>     >>> b'%a' % 'def'
>     b"'def'"
>
>
> Unsupported codes
> -----------------
>
> ``%r`` (which calls ``__repr__`` and returns a ``str``) is not supported.
>
>
> Compatibility with Python 2
> ===========================
>
> As noted above, ``%s`` is being included solely to help ease migration
> from,
> and/or have a single code base with, Python 2.  This is important as there
> are modules both in the wild and behind closed doors that currently use the
> Python 2 ``str`` type as a ``bytes`` container, and hence are using ``%s``
> as a bytes interpolator.
>
> However, ``%b`` should be used in new, Python 3 only code, so ``%s`` will
> immediately be deprecated, but not removed until the next major Python
> release.
>
>
> Proposed variations
> ===================
>
> It has been proposed to automatically use ``.encode('ascii','strict')`` for
> ``str`` arguments to ``%b``.
>
>   - Rejected as this would lead to intermittent failures.  Better to have
> the
>     operation always fail so the trouble-spot can be correctly fixed.
>
> It has been proposed to have ``%b`` return the ascii-encoded repr when the
> value is a ``str`` (b'%b' % 'abc'  --> b"'abc'").
>
>   - Rejected as this would lead to hard to debug failures far from the
> problem
>     site.  Better to have the operation always fail so the trouble-spot
> can be
>     easily fixed.
>
> Originally this PEP also proposed adding format-style formatting, but it
> was
> decided that format and its related machinery were all strictly text (aka
> ``str``) based, and it was dropped.
>
> Various new special methods were proposed, such as ``__ascii__``,
> ``__format_bytes__``, etc.; such methods are not needed at this time, but
> can
> be visited again later if real-world use shows deficiencies with this
> solution.
>
> A competing PEP, ``PEP 460 Add binary interpolation and formatting`` [7]_,
> also exists.
>
>
> Objections
> ==========
>
> The objections raised against this PEP were mainly variations on two
> themes:
>
>   - the ``bytes`` and ``bytearray`` types are for pure binary data, with no
>     assumptions about encodings
>
>   - offering %-interpolation that assumes an ASCII encoding will be an
>     attractive nuisance and lead us back to the problems of the Python 2
>     ``str``/``unicode`` text model
>
> As was seen during the discussion, ``bytes`` and ``bytearray`` are also
> used
> for mixed binary data and ASCII-compatible segments: file formats such as
> ``dbf`` and ``pdf``, network protocols such as ``ftp`` and ``email``, etc.
>
> ``bytes`` and ``bytearray`` already have several methods which assume an
> ASCII
> compatible encoding.  ``upper()``, ``isalpha()``, and ``expandtabs()`` to
> name
> just a few.  %-interpolation, with its very restricted mini-language, will
> not
> be any more of a nuisance than the already existing methods.
>
> Some have objected to allowing the full range of numeric formatting codes
> with
> the claim that decimal alone would be sufficient.  However, at least two
> formats (dbf and pdf) make use of non-decimal numbers.
>
>
> Footnotes
> =========
>
> .. [1] http://docs.python.org/2/library/stdtypes.html#string-formatting
> .. [2] neither string.Template, format, nor str.format are under
> consideration
> .. [3] https://mail.python.org/pipermail/python-dev/2014-
> January/131518.html
> .. [4] http://docs.python.org/3/c-api/buffer.html
>        examples:  ``memoryview``, ``array.array``, ``bytearray``, ``bytes``
> .. [5] http://docs.python.org/3/reference/datamodel.html#object.__bytes__
> .. [6] https://mail.python.org/pipermail/python-dev/2014-
> February/132750.html
> .. [7] http://python.org/dev/peps/pep-0460/
>
>
> Copyright
> =========
>
> This document has been placed in the public domain.
>
>
> ..
>    Local Variables:
>    mode: indented-text
>    indent-tabs-mode: nil
>    sentence-end-double-space: t
>    fill-column: 70
>    coding: utf-8
>    End:
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/
> guido%40python.org
>



-- 
--Guido van Rossum (python.org/~guido)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20140327/cdcdd43a/attachment-0001.html>


More information about the Python-Dev mailing list