[Python-Dev] PEP 461: Adding % formatting to bytes and bytearray -- Final, Take 3

Thu Mar 27 20:18:41 CET 2014

On Thu, Mar 27, 2014 at 2:53 PM, Guido van Rossum <guido at python.org> wrote:
> So what's the use case for Python 2/3 compatible code? IMO the main use case
> for the PEP is simply to be able to construct bytes from a combination of a
> template and some input that may include further bytes and numbers. E.g. in
> asyncio when you write an HTTP client or server you have to construct bytes
> to write to the socket, and I'd be happy if I could write b'HTTP/1.0 %d
> %b\r\n' % (status, message) rather than having to use
> str(status).encode('ascii') and concatenation or join().

It seems to be notoriously difficult to understand or explain why
Unicode can still be very hard in Python 3 or in code that is in the
middle of being ported or has to run in both interpreters. As far as I
can tell part of it is when a symbol has type(str or bytes) depending
(declared as if we had a static type system with union types); some of
it is because incorrect mixing can happen without an exception, only
to be discovered later and far away in space and time from the error
(worse of all in a serialized file), and part of it is all of the not
easily checkable "types" a particular Unicode object has depending on
whether it contains surrogates or codes > n. Sometimes you might
simply disagree about whether an API should be returning bytes or
Unicode in mildly ambiguous cases like base64 encoding. Sometimes
Unicode is just intrinsically complicated.

For me this PEP holds the promise of being able to do work in the
bytes domain, with no accidental mixing ever, when I *really* want
bytes. For 2+3 I would get exceptions sometimes in Python 2 and
exceptions all the time in Python 3 for mistakes. I hope this is less
error prone in strict domains than for example u"string
processing".encode('latin1'). And I hope that there is very little
type(str or int) in HTTP for example or other "legitimate" bytes
domains but I don't know; I suspect that if you have a lot of problems
with bytes' %s then it's a clue you should use (u"%s" %
(argument)).encode() instead.

sprintf()'s version of %s just takes a char* and puts it in without
doing any type conversion of course. IANACL (I am not a C lawyer).