[Python-Dev] PEP 3333: wsgi_string() function
P.J. Eby
pje at telecommunity.com
Fri Jan 7 18:04:46 CET 2011
At 09:43 AM 1/7/2011 -0500, James Y Knight wrote:
>On Jan 7, 2011, at 6:51 AM, Victor Stinner wrote:
> > I don't understand why you are attached to this horrible hack
> > (bytes-in-unicode). It introduces more work and more confusing than
> > using raw bytes unchanged.
> >
> > It doesn't work and so something has to be changed.
>
>It's gross but it does work. This has been discussed ad-nausium on
>web-sig over a period of years.
>
>I'd like to reiterate that it is only even a potential issue for the
>PATH_INFO/SCRIPT_NAME keys. Those two keys are required to have been
>urldecoded already, into byte-data in some encoding. For all the
>other keys (including the ones from os.environ), they are either
>*properly* decoded in 8859-1 or are just ascii (possibly still
>urlencoded, so the app needs to urldecode and decode into a string
>with the correct encoding).
Right. Also, it should be mentioned that none of this would be
necessary if we could've gotten a "bytes of a known encoding"
type. If you look back to the last big Python-Dev discussion on
bytes/unicode and stdlib API breakage, this was the holdup for
getting a sane WSGI spec.
Since we couldn't change the language to fix the problem (due to the
moratorium), we had to use this less-pleasant way of dealing with
things, in order to get a final WSGI spec for Python 3.
(If anybody is wondering about the specifics of the language change
that was needed, it'd be having a "bytes with known encoding" type,
that when combined in any polymorphic operation with a unicode
string, would result in bytes-with-encoding output, and would raise
an error if the resulting value could not be encoded in the target
encoding. Then we would simply do all WSGI header operations with
this type, using latin-1 as the target encoding.)
More information about the Python-Dev
mailing list