<div class="gmail_quote">On Sat, Mar 3, 2012 at 5:02 AM, Lennart Regebro <span dir="ltr"><<a href="mailto:regebro@gmail.com">regebro@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div class="im">I'm not sure that's true at all. In most cases where you support both</div>
Python 2 and Python 3, most strings will be "native", ie, without<br>
prefix in either Python 2 or Python 3. The native case is the most<br>
common case.<br></blockquote><div><br></div><div>Exactly. The reason "native strings" even exist as a concept in WSGI was to make it so that the idiomatic manipulation of header data in both Python 2 and 3 would use plain old string constants with no special wrappers or markings.</div>
<div><br></div><div>What's thrown the monkey wrench in here for the WSGI case is the use of unicode_literals. If you simply skip using unicode_literals for WSGI code, you should be fine with a single 2/3 codebase. But then you need some way to mark some things as unicode... which is how we end up back at this PEP.</div>
<div><br></div><div>I suppose WSGI could have gone the route of using byte strings for headers instead, but I'm not sure it would have helped. The design goals for PEP 3333 were to sanely support both 2to3 and 2+3 single codebases, and WSGI does actually do that... for the code that's actually doing WSGI stuff.</div>
<div><br></div><div>Ironically enough, the effect of the WSGI API is that it's all the *non* WSGI-specific code in the same module that ends up needing to mark its strings as unicode... or else it has to use unicode_literals and mark all the WSGI code with str(). There's really no good way to deal with a *mixed* WSGI/non-WSGI module, except to use explicit markers on one side or the other.</div>
<div><br></div><div>Perhaps the simplest solution of all might be to just isolate direct WSGI code in modules that don't import unicode_literals. Web frameworks usually hide WSGI stuff away from the user anyway, and many are already natively unicode in their app-facing APIs. So, if a framework or library encapsulates WSGI in a str-safe/unicode-friendly API, this really shouldn't be an issue for the library's users. But I suppose somebody's got to port the libraries first. ;-)</div>
<div><br></div><div>If anyone's updating porting strategy stuff, a mention of this in the tips regarding unicode_literals would be a good idea. i.e., something like:</div><div><br></div><div>"If you have 2.x modules which work with WSGI and also contain explicit u'' strings, you should not use unicode_literals unless you are willing to explicitly mark all WSGI environment and header strings as native strings using 'str()'. This is necessary because WSGI headers and environment keys/values are defined as byte strings in Python 2.x, and unicode strings in 3.x. Alternatively, you may continue to use u'' strings if you are targeting Python 3.3+ only, or can use the import or install hooks provided for Python 3.2, or if you are using 2to3... but in this case you should not use unicode_literals."</div>
<div><br></div><div>That could probably be written a lot more clearly. ;-)</div><div><br></div><div><br></div><div><br></div></div>