[Python-Dev] PEP 414

Sun Mar 4 20:35:30 CET 2012

On Sat, Mar 3, 2012 at 5:02 AM, Lennart Regebro <regebro at gmail.com> wrote:

> I'm not sure that's true at all. In most cases where you support both
> Python 2 and Python 3, most strings will be "native", ie, without
> prefix in either Python 2 or Python 3. The native case is the most
> common case.
>

Exactly.  The reason "native strings" even exist as a concept in WSGI was
to make it so that the idiomatic manipulation of header data in both Python
2 and 3 would use plain old string constants with no special wrappers or
markings.

What's thrown the monkey wrench in here for the WSGI case is the use of
unicode_literals.  If you simply skip using unicode_literals for WSGI code,
you should be fine with a single 2/3 codebase.  But then you need some way
to mark some things as unicode...  which is how we end up back at this PEP.

I suppose WSGI could have gone the route of using byte strings for headers
instead, but I'm not sure it would have helped.  The design goals for PEP
3333 were to sanely support both 2to3 and 2+3 single codebases, and WSGI
does actually do that...  for the code that's actually doing WSGI stuff.

Ironically enough, the effect of the WSGI API is that it's all the *non*
WSGI-specific code in the same module that ends up needing to mark its
strings as unicode...  or else it has to use unicode_literals and mark all
the WSGI code with str().  There's really no good way to deal with a
*mixed* WSGI/non-WSGI module, except to use explicit markers on one side or
the other.

Perhaps the simplest solution of all might be to just isolate direct WSGI
code in modules that don't import unicode_literals.  Web frameworks usually
hide WSGI stuff away from the user anyway, and many are already natively
unicode in their app-facing APIs.  So, if a framework or library
encapsulates WSGI in a str-safe/unicode-friendly API, this really shouldn't
be an issue for the library's users.  But I suppose somebody's got to port
the libraries first.  ;-)

If anyone's updating porting strategy stuff, a mention of this in the tips
regarding unicode_literals would be a good idea.  i.e., something like:

"If you have 2.x modules which work with WSGI and also contain explicit u''
strings, you should not use unicode_literals unless you are willing to
explicitly mark all WSGI environment and header strings as native strings
using 'str()'.  This is necessary because WSGI headers and environment
keys/values are defined as byte strings in Python 2.x, and unicode strings
in 3.x.  Alternatively, you may continue to use u'' strings if you are
targeting Python 3.3+ only, or can use the import or install hooks provided
for Python 3.2, or if you are using 2to3...  but in this case you should
not use unicode_literals."

That could probably be written a lot more clearly.  ;-)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20120304/a34939e1/attachment.html>