[Python-Dev] PEP 460: allowing %d and %f and mojibake

Paul Moore p.f.moore at gmail.com
Sun Jan 12 23:29:14 CET 2014


On 12 January 2014 22:10, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
> I think the readability argument becomes a bit sharper when
> you consider more complex examples, e.g. if I have a tuple
> of 3 floats that I want to put into a PDF file, then
>
>    b"%f %f %f" % my_floats
>
> is considerably clearer than
>
>    b" ".join((float_to_bytes(f) for f in my_floats))

Hmm, I'm not sure I'd agree. I'd quote "explicit is better than
implicit", but given comments below, that would be a mistake :-) Let's
just leave it that I'd probably wrap the whole thing in a
float_list(floats) function in my application, and not *care* how it
was implemented.

One thing that this does bring up, though, is that all the talk is
about %-formatting. Do the people who are arguing for numeric
formatting have views on what (if any) features will be included in
bytes.format()? It seems to me that recasting many of the discussions
using format() make it much less "obvious" that adding the features to
bytes formatting is a reasonable thing to do. I won't give specific
examples, because I would be putting words into people's mouths. But I
*would* say that any genuine proposal for numeric formatting in bytes
should be cast as a formal PEP and explicitly document both % and
format() behaviours.

> It's indicated (I won't say "implied", see below) by the
> fact that we're interpolating it into a bytes object rather
> than a string.
>
> This is no more or less implicit than the fact that when
> we write
>
>    b"ABC"
>
> then we're saying that those characters are to be encoded
> in ASCII, and not EBCDIC or UTF-16 or...

That's a fair point, and one I had not taken into consideration.

> BTW, there's a problem with bandying around the words
> "implicit" and "explicit", because they depend on your frame
> of reference. For example, one person might say that the
> fact that b"%s" encodes into ASCII is implicit, because
> ASCII isn't written down in the code anywhere. But another
> person might say it's explicit, because the manual explicitly
> says that stuff interpolated into a bytes object is encoded
> as ASCII.

In my defense, I would say that I was trying to clarify Nick's
objections, and it's entirely possible I misrepresented this aspect of
them.

Personally, I agree that it's not as black and white as simply saying
"numeric formatting is wrong", but I think that the fact that %d et al
represent a "double transformation" (from number to string
representation to encoded bytes) is the differentiating factor here.
Proposals that do nothing but interpolation are essentially
convenience wrappers for various combinations of concatenation and
join. Adding "double transformation" formatting codes is a step
change, and needs to be explicitly acknowledged and justified. (If you
*do* manage to justify such codes, there's a secondary question of
precisely what codes should be supported, but we can start by getting
agreement that the *class* of codes is allowed). PEP 460 explicitly
excludes anything but pure interpolation.

> So arguments of the form "X is bad because it's not
> explicit" are prone to getting people talking past each
> other.

Fair point. I hope my above paragraph clarifies my position somewhat better.

Paul


More information about the Python-Dev mailing list