[Python-Dev] PEP 460: allowing %d and %f and mojibake

Stephen J. Turnbull stephen at xemacs.org
Sun Jan 12 21:02:41 CET 2014


Georg Brandl writes:

 > > if it weren't for your stupid maximalist opposition).
 > 
 > Can you please stop throwing personal insults around?  You don't have to
 > resort to that level.

Ethan's posts (as an example of one general trend in this thread) are
pretty frustrating, you have to admit.

MAL posted straight out the Python 2 model of text makes it easier for
him to write some programs, so he's all for reintroducing it.  And
that is the whole truth of the matter.  Although I disagree with him,
I appreciate his honesty.

But people keep posting "we don't want Python 2's confounding of text
and binary, we just want bytes with (nearly) all the functionality of
strings [because they are (partially|really) encoded text]".  Some of
them actually use the literal word "text" in their justification!

That's, well, what would you call it?  Either they know what they're
saying, in which case it's disingenuous at best, or they don't know
what they're saying, in which case it's a proposal based on a clear
misunderstanding of the situation.  The problem is not going to go
away just because they *say* they don't want to reintroduce Python 2
text processing.  That is precisely what this proposal is *intended*
to do, whether in the limited form proposed by Antoine or in the much
more extensive form that folks like Ethan want.

What "maximalists" mean is that they promise not to abuse Python 2
text processing when writing Python 3 programs.  This promise is
highly unlikely to be kept for two reasons.  First, they can't make
that promise on behalf of third parties, who for various reasons
certainly will abuse these features to avoid the encoded-text-to-
Unicode-text and vice-versa conversions.  Second, I doubt they
themselves will keep the promise to my satisfaction because their
definition of "text" is ambiguous.  When it's convenient for them to
use text-processing operations on bytes, they'll say "oh, yes, these
are conventionally considered text-processing features, but that's
just an accident of the particular configuration of bytes -- yup,
bytes -- I'm processing."

You could argue that this "abuse" isn't *abuse*.  That it's covered by
"consenting adults".  By the same token, so is smoking in a crowded
elevator -- if you don't like it, don't use the elevator!  Of course
in applications used only by the author, there's no abuse (at least
not of others! :-/ )

But Nick's important example of web frameworks demonstrates the
problem: unless they convert to text where appropriate, they're just
pushing the problem off on application writers.  Sometimes passing on
data as bytes is appropriate, of course, but the framework authors are
likely to be biased in favor of doing that, and it's not hard to
imagine frameworks ported from Python 2 passing on the problem
wholesale on the grounds that "we returned str in Python 2 which is
bytes in Python 3, and since we were processing bytes the whole time,
we see no reason to change the 'ABI'."  Of course the application
writers thought they were receiving text "in an inconvenient and
ambiguous form".  IMO, with the proposed changes, that is likely to
continue indefinitely, negating some of the gains I expected to
receive from Python 3. :-(

Note: there are a lot of high-level frameworks like Django that even
in Python 2 basically went to Unicode everywhere internally.  I don't
deny that.  I think that Python 3 as currently constituted makes it a
lot easier to make an appropriate decision of where to convert, and
should take some of the burden off the high-level frameworks.
Approving this PEP, especially in a maximalist form, will blur the
lines.



More information about the Python-Dev mailing list