[Web-SIG] Request for Comments on upcoming WSGI Changes
and-py at doxdesk.com
Mon Sep 21 14:49:50 CEST 2009
> A middleware might re-decode the values if the `wsgi.uri_encoding` is
> `iso-8859-1` and only then.
Seems like a mistake. If the middleware knows iso-8859-7 is in use, it
would need to transcode the charset regardless of whether the
initially-submitted bytes were a valid UTF-8 sequence or not. Otherwise
the application would break when fed with eg. Greek words that happened
to encode to valid UTF-8 bytes.
> The application MUST use this value to decode the ``'QUERY_STRING'``
> as well.
This will break all use of non-UTF-8 encodings in QUERY_STRING, where
the path part of the URL does not contain non-UTF-8 sequences. That
includes the very common case where the path part contains only ASCII.
will fail, as the given UTF-8 sniffer only looks at the path part to
determine what encoding to use for both of the path part and the query
string. I don't think WSGI should mandate any particular decoding of the
To be honest, I'm still uncomfortable with any use of Unicode strings in
WSGI. But if we're going to do it, I'd go for consistency. Treating the
decoding of the URL specially is a nasty hack that is only there because
the CGI spec stupidly requires %-decoding to be done on PATH_INFO and
So why not go with (the long-ago suggested) optional variables like
'wsgi.real_path_info' that, if present, are the original strings before
%-decoding? Now it doesn't greatly matter what string types and
encodings we pick, because everything will be ASCII anyway. It also
solves the %2F problem.
If those variables are not present (typically for CGI environments that
cannot provide them), the application/framework *may* try recover
non-ASCII characters from PATH_INFO/QUERY_STRING, with undefined
results. This is the broken-but-sometimes-rescuable status quo for CGI:
by the time Python reads non-ASCII characters out of the environment
they may already have been mangled by up to two conversion processes.
mailto:and at doxdesk.com
More information about the Web-SIG