[Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5

Antoine Pitrou solipsis at pitrou.net
Wed Jan 8 13:01:21 CET 2014


Hi Victor,

On Mon, 6 Jan 2014 14:24:50 +0100
Victor Stinner <victor.stinner at gmail.com> wrote:
> Hi,
> 
> bytes % args and bytes.format(args) are requested by Mercurial and
> Twisted projects. The issue #3982 was stuck because nobody proposed a
> complete definition of the "new" features. Here is a try as a PEP.

There is a good use case at:
https://mail.python.org/pipermail/python-ideas/2014-January/024803.html

Regards

Antoine.


> 
> The PEP is a draft with open questions. First, I'm not sure that both
> bytes%args and bytes.format(args) are needed. The implementation of
> .format() is more complex, so why not only adding bytes%args? Then,
> the following points must be decided to define the complete list of
> supported features (formatters):
> 
> * Format integer to hexadecimal? ``%x`` and ``%X``
> * Format integer to octal? ``%o``
> * Format integer to binary? ``{!b}``
> * Alignment?
> * Truncating? Truncate or raise an error?
> * format keywords? ``b'{arg}'.format(arg=5)``
> * ``str % dict`` ? ``b'%(arg)s' % {'arg': 5)``
> * Floating point number?
> * ``%i``, ``%u`` and ``%d`` formats for integer numbers?
> * Signed number? ``%+i`` and ``%-i``
> 
> 
> HTML version of the PEP:
> http://www.python.org/dev/peps/pep-0460/
> 
> Inline copy:
> 
> PEP: 460
> Title: Add bytes % args and bytes.format(args) to Python 3.5
> Version: $Revision$
> Last-Modified: $Date$
> Author: Victor Stinner <victor.stinner at gmail.com>
> Status: Draft
> Type: Standards Track
> Content-Type: text/x-rst
> Created: 6-Jan-2014
> Python-Version: 3.5
> 
> 
> Abstract
> ========
> 
> Add ``bytes % args`` operator and ``bytes.format(args)`` method to
> Python 3.5.
> 
> 
> Rationale
> =========
> 
> ``bytes % args`` and ``bytes.format(args)`` have been removed in Python
> 2. This operator and this method are requested by Mercurial and Twisted
> developers to ease porting their project on Python 3.
> 
> Python 3 suggests to format text first and then encode to bytes. In
> some cases, it does not make sense because arguments are bytes strings.
> Typical usage is a network protocol which is binary, since data are
> send to and received from sockets. For example, SMTP, SIP, HTTP, IMAP,
> POP, FTP are ASCII commands interspersed with binary data.
> 
> Using multiple ``bytes + bytes`` instructions is inefficient because it
> requires temporary buffers and copies which are slow and waste memory.
> Python 3.3 optimizes ``str2 += str2`` but not ``bytes2 += bytes1``.
> 
> ``bytes % args`` and ``bytes.format(args)`` were asked since 2008, even
> before the first release of Python 3.0 (see issue #3982).
> 
> ``struct.pack()`` is incomplete. For example, a number cannot be
> formatted as decimal and it does not support padding bytes string.
> 
> Mercurial 2.8 still supports Python 2.4.
> 
> 
> Needed and excluded features
> ============================
> 
> Needed features
> 
> * Bytes strings: bytes, bytearray and memoryview types
> * Format integer numbers as decimal
> * Padding with spaces and null bytes
> * "%s" should use the buffer protocol, not str()
> 
> The feature set is minimal to keep the implementation as simple as
> possible to limit the cost of the implementation. ``str % args`` and
> ``str.format(args)`` are already complex and difficult to maintain, the
> code is heavily optimized.
> 
> Excluded features:
> 
> * no implicit conversion from Unicode to bytes (ex: encode to ASCII or
>   to Latin1)
> * Locale support (``{!n}`` format for numbers). Locales are related to
>   text and usually to an encoding.
> * ``repr()``, ``ascii()``: ``%r``, ``{!r}``, ``%a`` and ``{!a}``
>   formats. ``repr()`` and ``ascii()`` are used to debug, the output is
>   displayed a terminal or a graphical widget. They are more related to
>   text.
> * Attribute access: ``{obj.attr}``
> * Indexing: ``{dict[key]}``
> * Features of struct.pack(). For example, format a number as 32 bit unsigned
>   integer in network endian. The ``struct.pack()`` can be used to prepare
>   arguments, the implementation should be kept simple.
> * Features of int.to_bytes().
> * Features of ctypes.
> * New format protocol like a new ``__bformat__()`` method. Since the
> * list of
>   supported types is short, there is no need to add a new protocol.
>   Other types must be explicitly casted.
> * Alternate format for integer. For example, ``'{|#x}'.format(0x123)``
>   to get ``0x123``. It is more related to debug, and the prefix can be
>   easily be written in the format string (ex: ``0x%x``).
> * Relation with format() and the __format__() protocol. bytes.format()
>   and str.format() are unrelated.
> 
> Unknown:
> 
> * Format integer to hexadecimal? ``%x`` and ``%X``
> * Format integer to octal? ``%o``
> * Format integer to binary? ``{!b}``
> * Alignment?
> * Truncating? Truncate or raise an error?
> * format keywords? ``b'{arg}'.format(arg=5)``
> * ``str % dict`` ? ``b'%(arg)s' % {'arg': 5)``
> * Floating point number?
> * ``%i``, ``%u`` and ``%d`` formats for integer numbers?
> * Signed number? ``%+i`` and ``%-i``
> 
> 
> bytes % args
> ============
> 
> Formatters:
> 
> * ``"%c"``: one byte
> * ``"%s"``: integer or bytes strings
> * ``"%20s"`` pads to 20 bytes with spaces (``b' '``)
> * ``"%020s"`` pads to 20 bytes with zeros (``b'0'``)
> * ``"%\020s"`` pads to 20 bytes with null bytes (``b'\0'``)
> 
> 
> bytes.format(args)
> ==================
> 
> Formatters:
> 
> * ``"{!c}"``: one byte
> * ``"{!s}"``: integer or bytes strings
> * ``"{!.20s}"`` pads to 20 bytes with spaces (``b' '``)
> * ``"{!.020s}"`` pads to 20 bytes with zeros (``b'0'``)
> * ``"{!\020s}"`` pads to 20 bytes with null bytes (``b'\0'``)
> 
> 
> Examples
> ========
> 
> * ``b'a%sc%s' % (b'b', 4)`` gives ``b'abc4'``
> * ``b'a{}c{}'.format(b'b', 4)`` gives ``b'abc4'``
> * ``b'%c'`` % 88`` gives ``b'X``'
> * ``b'%%'`` gives ``b'%'``
> 
> 
> Criticisms
> ==========
> 
> * The development cost and maintenance cost.
> * In 3.3 encoding to ascii or latin1 is as fast as memcpy
> * Developers must work around the lack of bytes%args and
>   bytes.format(args) anyway to support Python 3.0-3.4
> * bytes.join() is consistently faster than format to join bytes strings.
> * Formatting functions can be implemented in a third party module
> 
> 
> References
> ==========
> 
> * `Issue #3982: support .format for bytes
>   <http://bugs.python.org/issue3982>`_
> * `Mercurial project
>   <http://mercurial.selenic.com/>`_
> * `Twisted project
>   <http://twistedmatrix.com/trac/>`_
> * `Documentation of Python 2 formatting (str % args)
>   <http://docs.python.org/2/library/stdtypes.html#string-formatting>`_
> * `Documentation of Python 2 formatting (str.format)
>   <http://docs.python.org/2/library/string.html#formatstrings>`_
> 
> Copyright
> =========
> 
> This document has been placed in the public domain.
> 
> 
> 
> ..
>    Local Variables:
>    mode: indented-text
>    indent-tabs-mode: nil
>    sentence-end-double-space: t
>    fill-column: 70
>    coding: utf-8
>    End:





More information about the Python-Dev mailing list