[Python-Dev] PEP 3333: wsgi_string() function

P.J. Eby pje at telecommunity.com
Fri Jan 7 18:04:46 CET 2011


At 09:43 AM 1/7/2011 -0500, James Y Knight wrote:
>On Jan 7, 2011, at 6:51 AM, Victor Stinner wrote:
> > I don't understand why you are attached to this horrible hack
> > (bytes-in-unicode). It introduces more work and more confusing than
> > using raw bytes unchanged.
> >
> > It doesn't work and so something has to be changed.
>
>It's gross but it does work. This has been discussed ad-nausium on 
>web-sig over a period of years.
>
>I'd like to reiterate that it is only even a potential issue for the 
>PATH_INFO/SCRIPT_NAME keys. Those two keys are required to have been 
>urldecoded already, into byte-data in some encoding. For all the 
>other keys (including the ones from os.environ), they are either 
>*properly* decoded in 8859-1 or are just ascii (possibly still 
>urlencoded, so the app needs to urldecode and decode into a string 
>with the correct encoding).

Right.  Also, it should be mentioned that none of this would be 
necessary if we could've gotten a "bytes of a known encoding" 
type.  If you look back to the last big Python-Dev discussion on 
bytes/unicode and stdlib API breakage, this was the holdup for 
getting a sane WSGI spec.

Since we couldn't change the language to fix the problem (due to the 
moratorium), we had to use this less-pleasant way of dealing with 
things, in order to get a final WSGI spec for Python 3.

(If anybody is wondering about the specifics of the language change 
that was needed, it'd be having a "bytes with known encoding" type, 
that when combined in any polymorphic operation with a unicode 
string, would result in bytes-with-encoding output, and would raise 
an error if the resulting value could not be encoded in the target 
encoding.  Then we would simply do all WSGI header operations with 
this type, using latin-1 as the target encoding.)



More information about the Python-Dev mailing list