[Python-Dev] PEP 461: Adding % formatting to bytes and bytearray -- Final, Take 3

Wed Mar 26 15:35:52 CET 2014

On 03/26/2014 03:10 AM, Victor Stinner wrote:
> 2014-03-25 23:37 GMT+01:00 Ethan Furman:
>>
>> ``%a`` will call ``ascii()`` on the interpolated value.
>
> I'm not sure that I understood correctly: is the "%a" format
> supported? The result of ascii() is a Unicode string. Does it mean
> that ("%a" % obj) should give the same result than
> ascii(obj).encode('ascii', 'strict')?

Changed to:
-------------------------------------------------------------------------------
``%a`` will give the equivalent of
``repr(some_obj).encode('ascii', 'backslashreplace')`` on the interpolated
value.  Use cases include developing a new protocol and writing landmarks
into the stream; debugging data going into an existing protocol to see if
the problem is the protocol itself or bad data; a fall-back for a serialization
format; or any situation where defining ``__bytes__`` would not be appropriate
but a readable/informative representation is needed [8].
-------------------------------------------------------------------------------

> Would it be possible to add a table or list to summarize supported
> format characters? I found:
>
> - single byte: %c
> - integer: %d, %u, %i, %o, %x, %X, %f, %g, "etc." (can you please
> complete "etc." ?)
> - bytes and __bytes__ method: %s
> - ascii(): %a

Changed to:
-------------------------------------------------------------------------------
%-interpolation
---------------

All the numeric formatting codes (``d``, ``i``, ``o``, ``u``, ``x``, ``X``,
``e``, ``E'', ``f``, ``F``, ``g``, ``G``, and any that are subsequently added
to Python 3) will be supported, and will work as they do for str, including
the padding, justification and other related modifiers (currently ``#``, ``0``,
``-``, `` `` (space), and ``+`` (plus any added to Python 3)).  The only
non-numeric codes allowed are ``c``, ``s``, and ``a``.

For the numeric codes, the only difference between ``str`` and ``bytes`` (or
``bytearray``) interpolation is that the results from these codes will be
ASCII-encoded text, not unicode.  In other words, for any numeric formatting
code `%x`::
-------------------------------------------------------------------------------

> I don't understand the purpose of this sentence. Does it mean that %a
> must not be used? IMO this sentence can be removed.

The sentence about %a being for debugging has been removed.

>> Non-ASCII values will be encoded to either ``\xnn`` or ``\unnnn``
>> representation.
>
> Unicode is larger than that! print(ascii(chr(0x10ffff))) => '\U0010ffff'

Removed.  With the explicit reference to the 'backslashreplace' error handler any who want to know what it might look 
like can refer to that.

>> .. note::
>>
>>      If a ``str`` is passed into ``%a``, it will be surrounded by quotes.
>
> And:
>
> - bytes gets a "b" prefix and surrounded by quotes as well  (b'...')
> - the quote ' is escaped as \' if the string contains quotes ' and "

Shouldn't be an issue now with the new definition which no longer references the ascii() function.

> Can you also please add examples for %a?
-------------------------------------------------------------------------------
Examples::

     >>> b'%a' % 3.14
     b'3.14'

     >>> b'%a' % b'abc'
     b'abc'

     >>> b'%a' % 'def'
     b"'def'"
-------------------------------------------------------------------------------

>> Proposed variations
>> ===================
>>
>
> It would be fair to mention also a whole different PEP, Antoine's PEP 460!

My apologies for the omission.
-------------------------------------------------------------------------------
A competing PEP, ``PEP 460 Add binary interpolation and formatting`` [9], also
exists.

.. [9] http://python.org/dev/peps/pep-0460/
-------------------------------------------------------------------------------

Thank you, Victor.