[Python-Dev] PEP 461: Adding % formatting to bytes and bytearray -- Final, Take 2

Victor Stinner victor.stinner at gmail.com
Sun Feb 23 12:30:25 CET 2014


Hi,

First, this is a warning in reST syntax:

System Message: WARNING/2 (pep-0461.txt, line 53)

> This area of programming is characterized by a mixture of binary data and
> ASCII compatible segments of text (aka ASCII-encoded text).  Bringing back a
> restricted %-interpolation for ``bytes`` and ``bytearray`` will aid both in
> writing new wire format code, and in porting Python 2 wire format code.

You may give some examples here: HTTP (Latin1 headers, binary body),
SMTP, FTP, etc.

> All the numeric formatting codes (such as ``%x``, ``%o``, ``%e``, ``%f``,
> ``%g``, etc.) will be supported, and will work as they do for str, including
> the padding, justification and other related modifiers.

IMO you should give the exhaustive list here and we should only
support one formatter for integers: %d. Python 2 supports "%d", "%u"
and "%i" with "%u" marked as obsolete. Python 3.5 should not
reintroduce obsolete formatters. If you want to use the same code base
for Python 2.6, 2.7 and 3.5: modify your code to only use %d. Same
rule apply for 2to3 tool: modify your source code to be compatible
with Python 3.

Please also mention all flags: #, +, -, '0', ' '.

> ``%c`` will insert a single byte, either from an ``int`` in range(256), or
> from
> a ``bytes`` argument of length 1, not from a ``str``.

I'm not sure that supporting bytes argument of 1 byte is useful, but
it should not be hard to implement and may be convinient.

> ``%s`` is restricted in what it will accept::
>
>   - input type supports ``Py_buffer`` [6]_?
>     use it to collect the necessary bytes
>
>   - input type is something else?
>     use its ``__bytes__`` method [7]_ ; if there isn't one, raise a
> ``TypeError``

Hum, you may mention that bytes(n: int) creates a bytes string of n
null bytes, but b'%s' % 123 will raise an error because
int.__bytes__() is not defined. Just to be more explicit.

> ``%a`` will call :func:``ascii()`` on the interpolated value's
> :func:``repr()``.
> This is intended as a debugging aid, rather than something that should be
> used
> in production.  Non-ascii values will be encoded to either ``\xnn`` or
> ``\unnnn``
> representation.

(You forgot "/Uhhhhhhhh" representation (it's an antislah, but I don't
see the key on my Mac keyboard?).)

What is the use case of this *new* formatter? How do you use it?
print(b'%a" % 123) may emit a BytesWarning and may lead to bugs.

IMO %a should be restricted for str%args.

> It has been suggested to use ``%b`` for bytes as well as ``%s``.

PyArg_ParseTuple() uses %y format for the exact bytes type.
>
>   - Pro: clearly says 'this is bytes'; should be used for new code.
>
>   - Con: does not exist in Python 2.x, so we would have two ways of doing
> the
>     same thing, ``%s`` and ``%b``, with no difference between them.

IMO it's useless, b'%s' % bytes just work fine in Python 2 and Python 3.

--

I would like to help you to implement the PEP. IMO we should share as
much code as possible with PyUnicodeObject. Something using the
stringlib and maybe a new PyBytesWriter API which would have an API
close to PyUnicodeWriter API. We should also try to share code between
PyBytes_Format() and PyBytes_FromFormat().

Victor


More information about the Python-Dev mailing list