[Python-Dev] PEP 414

Sat Mar 3 07:35:48 CET 2012

Chris McDonough writes:

 > FWIW, I think this issue's webness may be overestimated.  There happens
 > to be lots and lots of existing UI code which contains complex
 > interactions between unicode literals and nonliterals in web apps, but
 > there's also likely lots of nonweb code that has the same issue.

If we generalize "web" to "wire protocols", I would say that nonweb
code that has the same issue is poorly coded.  I suppose there may be
some similar issues in say XML handling, because XML can be used in
binary applications as well as for structuring text (ie, XML is really
a wire protocol too).  But pure user interface modules like wxPython?
Text should be handled as text, not as bytes that "probably" are
ASCII-encoded or locale-specifically-encoded (or are magic numbers
that happen to be mnemonic when interpreted as ASCII).

I don't say that we should ignore the pain of the nonweb users -- but
it is a different issue, with different solutions.  In particular,
using "native strings" (and distinguishing them by the absence of u'')
is usually a non-solution for non-web applications, because it
propagates the bad practice of pretending that unknown encodings can
be assumed to be well-behaved into an environment where good practice
is designed in.

This is quite different from the case for webby usage, where it often
makes sense to handle many low-level operations without ever
converting to text, while the same literal strings may be useful in
both wire and text contexts (and so should be present only once
according to DRY).

(N.B. I suspect that it is probably also generally possible for webby
applications to avoid native strings without much cost, as Nick showed
in urlparse.  But at least manipulations of the wire protocol without
conversion to text are a plausible optimization.)