[Python-Dev] PEP 461: Adding % formatting to bytes and bytearray -- Final, Take 2
victor.stinner at gmail.com
Sun Feb 23 12:30:25 CET 2014
First, this is a warning in reST syntax:
System Message: WARNING/2 (pep-0461.txt, line 53)
> This area of programming is characterized by a mixture of binary data and
> ASCII compatible segments of text (aka ASCII-encoded text). Bringing back a
> restricted %-interpolation for ``bytes`` and ``bytearray`` will aid both in
> writing new wire format code, and in porting Python 2 wire format code.
You may give some examples here: HTTP (Latin1 headers, binary body),
SMTP, FTP, etc.
> All the numeric formatting codes (such as ``%x``, ``%o``, ``%e``, ``%f``,
> ``%g``, etc.) will be supported, and will work as they do for str, including
> the padding, justification and other related modifiers.
IMO you should give the exhaustive list here and we should only
support one formatter for integers: %d. Python 2 supports "%d", "%u"
and "%i" with "%u" marked as obsolete. Python 3.5 should not
reintroduce obsolete formatters. If you want to use the same code base
for Python 2.6, 2.7 and 3.5: modify your code to only use %d. Same
rule apply for 2to3 tool: modify your source code to be compatible
with Python 3.
Please also mention all flags: #, +, -, '0', ' '.
> ``%c`` will insert a single byte, either from an ``int`` in range(256), or
> a ``bytes`` argument of length 1, not from a ``str``.
I'm not sure that supporting bytes argument of 1 byte is useful, but
it should not be hard to implement and may be convinient.
> ``%s`` is restricted in what it will accept::
> - input type supports ``Py_buffer`` _?
> use it to collect the necessary bytes
> - input type is something else?
> use its ``__bytes__`` method _ ; if there isn't one, raise a
Hum, you may mention that bytes(n: int) creates a bytes string of n
null bytes, but b'%s' % 123 will raise an error because
int.__bytes__() is not defined. Just to be more explicit.
> ``%a`` will call :func:``ascii()`` on the interpolated value's
> This is intended as a debugging aid, rather than something that should be
> in production. Non-ascii values will be encoded to either ``\xnn`` or
(You forgot "/Uhhhhhhhh" representation (it's an antislah, but I don't
see the key on my Mac keyboard?).)
What is the use case of this *new* formatter? How do you use it?
print(b'%a" % 123) may emit a BytesWarning and may lead to bugs.
IMO %a should be restricted for str%args.
> It has been suggested to use ``%b`` for bytes as well as ``%s``.
PyArg_ParseTuple() uses %y format for the exact bytes type.
> - Pro: clearly says 'this is bytes'; should be used for new code.
> - Con: does not exist in Python 2.x, so we would have two ways of doing
> same thing, ``%s`` and ``%b``, with no difference between them.
IMO it's useless, b'%s' % bytes just work fine in Python 2 and Python 3.
I would like to help you to implement the PEP. IMO we should share as
much code as possible with PyUnicodeObject. Something using the
stringlib and maybe a new PyBytesWriter API which would have an API
close to PyUnicodeWriter API. We should also try to share code between
PyBytes_Format() and PyBytes_FromFormat().
More information about the Python-Dev