[Python-Dev] PEP 461: Adding % formatting to bytes and bytearray -- Final, Take 3

Daniel Holth dholth at gmail.com
Wed Mar 26 03:10:57 CET 2014


I love it.

On Tue, Mar 25, 2014 at 6:37 PM, Ethan Furman <ethan at stoneleaf.us> wrote:
> Okay, I included that last round of comments (from late February).
>
> Barring typos, this should be the final version.
>
> Final comments?
>
> -----------------------------------------------------------------------------
> PEP: 461
> Title: Adding % formatting to bytes and bytearray
> Version: $Revision$
> Last-Modified: $Date$
> Author: Ethan Furman <ethan at stoneleaf.us>
> Status: Draft
> Type: Standards Track
> Content-Type: text/x-rst
> Created: 2014-01-13
> Python-Version: 3.5
> Post-History: 2014-01-14, 2014-01-15, 2014-01-17, 2014-02-22, 2014-03-25
> Resolution:
>
>
> Abstract
> ========
>
> This PEP proposes adding % formatting operations similar to Python 2's
> ``str``
> type to ``bytes`` and ``bytearray`` [1]_ [2]_.
>
>
> Rationale
> =========
>
> While interpolation is usually thought of as a string operation, there are
> cases where interpolation on ``bytes`` or ``bytearrays`` make sense, and the
> work needed to make up for this missing functionality detracts from the
> overall
> readability of the code.
>
>
> Motivation
> ==========
>
> With Python 3 and the split between ``str`` and ``bytes``, one small but
> important area of programming became slightly more difficult, and much more
> painful -- wire format protocols [3]_.
>
> This area of programming is characterized by a mixture of binary data and
> ASCII compatible segments of text (aka ASCII-encoded text).  Bringing back a
> restricted %-interpolation for ``bytes`` and ``bytearray`` will aid both in
> writing new wire format code, and in porting Python 2 wire format code.
>
> Common use-cases include ``dbf`` and ``pdf`` file formats, ``email``
> formats, and ``FTP`` and ``HTTP`` communications, among many others.
>
>
> Proposed semantics for ``bytes`` and ``bytearray`` formatting
> =============================================================
>
> %-interpolation
> ---------------
>
> All the numeric formatting codes (such as ``%x``, ``%o``, ``%e``, ``%f``,
> ``%g``, etc.) will be supported, and will work as they do for str, including
> the padding, justification and other related modifiers.  The only difference
> will be that the results from these codes will be ASCII-encoded text, not
> unicode.  In other words, for any numeric formatting code `%x`::
>
>    b"%x" % val
>
> is equivalent to
>
>    ("%x" % val).encode("ascii")
>
> Examples::
>
>    >>> b'%4x' % 10
>    b'   a'
>
>    >>> b'%#4x' % 10
>    ' 0xa'
>
>    >>> b'%04X' % 10
>    '000A'
>
> ``%c`` will insert a single byte, either from an ``int`` in range(256), or
> from
> a ``bytes`` argument of length 1, not from a ``str``.
>
> Examples::
>
>     >>> b'%c' % 48
>     b'0'
>
>     >>> b'%c' % b'a'
>     b'a'
>
> ``%s`` is included for two reasons:  1) `b` is already a format code for
> ``format`` numerics (binary), and 2) it will make 2/3 code easier as Python
> 2.x
> code uses ``%s``; however, it is restricted in what it will accept::
>
>   - input type supports ``Py_buffer`` [6]_?
>     use it to collect the necessary bytes
>
>   - input type is something else?
>     use its ``__bytes__`` method [7]_ ; if there isn't one, raise a
> ``TypeError``
>
> In particular, ``%s`` will not accept numbers (use a numeric format code for
> that), nor ``str`` (encode it to ``bytes``).
>
> Examples::
>
>     >>> b'%s' % b'abc'
>     b'abc'
>
>     >>> b'%s' % 'some string'.encode('utf8')
>     b'some string'
>
>     >>> b'%s' % 3.14
>     Traceback (most recent call last):
>     ...
>     TypeError: b'%s' does not accept numbers, use a numeric code instead
>
>     >>> b'%s' % 'hello world!'
>     Traceback (most recent call last):
>     ...
>     TypeError: b'%s' does not accept 'str', it must be encoded to `bytes`
>
>
> ``%a`` will call ``ascii()`` on the interpolated value.  This is intended
> as a debugging aid, rather than something that should be used in production.
> Non-ASCII values will be encoded to either ``\xnn`` or ``\unnnn``
> representation.  Use cases include developing a new protocol and writing
> landmarks into the stream; debugging data going into an existing protocol
> to see if the problem is the protocol itself or bad data; a fall-back for a
> serialization format; or even a rudimentary serialization format when
> defining ``__bytes__`` would not be appropriate [8].
>
> .. note::
>
>     If a ``str`` is passed into ``%a``, it will be surrounded by quotes.
>
>
> Unsupported codes
> -----------------
>
> ``%r`` (which calls ``__repr__`` and returns a ``str``) is not supported.
>
>
> Proposed variations
> ===================
>
> It was suggested to let ``%s`` accept numbers, but since numbers have their
> own
> format codes this idea was discarded.
>
> It has been suggested to use ``%b`` for bytes as well as ``%s``.  This was
> rejected as not adding any value either in clarity or simplicity.
>
> It has been proposed to automatically use ``.encode('ascii','strict')`` for
> ``str`` arguments to ``%s``.
>
>   - Rejected as this would lead to intermittent failures.  Better to have
> the
>     operation always fail so the trouble-spot can be correctly fixed.
>
> It has been proposed to have ``%s`` return the ascii-encoded repr when the
> value is a ``str`` (b'%s' % 'abc'  --> b"'abc'").
>
>   - Rejected as this would lead to hard to debug failures far from the
> problem
>     site.  Better to have the operation always fail so the trouble-spot can
> be
>     easily fixed.
>
> Originally this PEP also proposed adding format-style formatting, but it was
> decided that format and its related machinery were all strictly text (aka
> ``str``) based, and it was dropped.
>
> Various new special methods were proposed, such as ``__ascii__``,
> ``__format_bytes__``, etc.; such methods are not needed at this time, but
> can
> be visited again later if real-world use shows deficiencies with this
> solution.
>
>
> Objections
> ==========
>
> The objections raised against this PEP were mainly variations on two
> themes::
>
>   - the ``bytes`` and ``bytearray`` types are for pure binary data, with no
>     assumptions about encodings
>   - offering %-interpolation that assumes an ASCII encoding will be an
>     attractive nuisance and lead us back to the problems of the Python 2
>     ``str``/``unicode`` text model
>
> As was seen during the discussion, ``bytes`` and ``bytearray`` are also used
> for mixed binary data and ASCII-compatible segments: file formats such as
> ``dbf`` and ``pdf``, network protocols such as ``ftp`` and ``email``, etc.
>
> ``bytes`` and ``bytearray`` already have several methods which assume an
> ASCII
> compatible encoding.  ``upper()``, ``isalpha()``, and ``expandtabs()`` to
> name
> just a few.  %-interpolation, with its very restricted mini-language, will
> not
> be any more of a nuisance than the already existing methods.
>
> Some have objected to allowing the full range of numeric formatting codes
> with
> the claim that decimal alone would be sufficient.  However, at least two
> formats (dbf and pdf) make use of non-decimal numbers.
>
>
> Footnotes
> =========
>
> .. [1] http://docs.python.org/2/library/stdtypes.html#string-formatting
> .. [2] neither string.Template, format, nor str.format are under
> consideration
> .. [3] https://mail.python.org/pipermail/python-dev/2014-January/131518.html
> .. [4] to use a str object in a bytes interpolation, encode it first
> .. [5] %c is not an exception as neither of its possible arguments are str
> .. [6] http://docs.python.org/3/c-api/buffer.html
>        examples:  ``memoryview``, ``array.array``, ``bytearray``, ``bytes``
> .. [7] http://docs.python.org/3/reference/datamodel.html#object.__bytes__
> .. [8]
> https://mail.python.org/pipermail/python-dev/2014-February/132750.html
>
>
> Copyright
> =========
>
> This document has been placed in the public domain.
>
>
> ..
>    Local Variables:
>    mode: indented-text
>    indent-tabs-mode: nil
>    sentence-end-double-space: t
>    fill-column: 70
>    coding: utf-8
>    End:
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/dholth%40gmail.com


More information about the Python-Dev mailing list