[Python-Dev] PEP 460: allowing %d and %f and mojibake

Greg Ewing greg.ewing at canterbury.ac.nz
Sun Jan 12 23:10:59 CET 2014


Paul Moore wrote:
> On 12 January 2014 18:26, Ethan Furman <ethan at stoneleaf.us> wrote:
> 
>>I'm arguing from three PoVs:
>>1) 2 & 3 compatible code base
>>2) having the bytes type /be/ the boundary type
>>3) readable code
> 
> The only one of these that I can see being in any way an argument against
> 
> def int_to_bytes(n):
>     return str(n).encode('ascii')
> 
> b'Content Length: ' + int_to_bytes(len(binary_data))
> 
> is (3),

I think the readability argument becomes a bit sharper when
you consider more complex examples, e.g. if I have a tuple
of 3 floats that I want to put into a PDF file, then

    b"%f %f %f" % my_floats

is considerably clearer than

    b" ".join((float_to_bytes(f) for f in my_floats))

> My reading of Nick's refusal is that %d takes a value which is
> semantically a number, converts it into a base-10 representation
> (which is semantically a *string*, not a sequence of bytes[1]) and
> then *encodes* that string into a series of bytes using the ASCII
> encoding. That is *two* semantic transformations, and one (the ASCII
> encoding) is *implicit*. Specifically, it's implicit because (a) the
> normal reading of %d is "produce the base-10 representation of a
> number, and a base-10 representation is a *string*, and (b) because
> nowhere has ASCII been mentioned

It's indicated (I won't say "implied", see below) by the
fact that we're interpolating it into a bytes object rather
than a string.

This is no more or less implicit than the fact that when
we write

    b"ABC"

then we're saying that those characters are to be encoded
in ASCII, and not EBCDIC or UTF-16 or...

BTW, there's a problem with bandying around the words
"implicit" and "explicit", because they depend on your frame
of reference. For example, one person might say that the
fact that b"%s" encodes into ASCII is implicit, because
ASCII isn't written down in the code anywhere. But another
person might say it's explicit, because the manual explicitly
says that stuff interpolated into a bytes object is encoded
as ASCII.

So arguments of the form "X is bad because it's not
explicit" are prone to getting people talking past each
other.

-- 
Greg


More information about the Python-Dev mailing list