[Web-SIG] Request for Comments on upcoming WSGI Changes
and-py at doxdesk.com
Mon Sep 21 20:15:28 CEST 2009
Armin Ronacher wrote:
> The middleware can never know.
It's much more likely than to know than the server though!
> WSGI will demand UTF-8 URLs and only
> provide iso-XXX support for backwards compatibility.
It doesn't sound much like backwards compatibility to me if non-UTF-8
URLs break as soon as they coincidentally happen to be UTF-8 byte
sequences. I'm as much an advocate of "UTF-8 for everything everywhere!"
as anyone else, but unfortunately today there are still dark places
where you need non-UTF-8 URLs.
Incidentally, if wsgi.uri_encoding is going to be the way to signal that
the server has decoded bytes to characters using a known encoding, it
should be stressed that this should only be set when that encoding is
That is, wsgi.uri_encoding should be omitted (or None?) in cases where
another party has already decoded (and maybe mangled) the bytes using an
unknown encoding. In particular, CGI.
(In the case of Windows CGI the server will have decoded URI bytes into
Unicode characters, using a charset which it is impossible to find out.
In Apache it's iso-8859-1; in IIS it's UTF-8 as long as it was a valid
UTF sequence, otherwise it's the system codepage. This problem affects
the non-CGI implementation isapi_wsgi, too. Then the variables are read
as environment variables, which for Python 2 means another encode/decode
step on Windows using the system codepage, mangling non-codepage
characters. Python 3 has the opposite problem reading byte envvars using
UTF-8, which won't be how Apache put them there.)
If wsgi.encoding is obligatory then in reality it will often be wrong,
leaving us in the same pathetic predicament as with WSGI 1.0, where
non-ASCII URIs don't work reliably at all.
mailto:and at doxdesk.com
More information about the Web-SIG