[Web-SIG] Request for Comments on upcoming WSGI Changes

And Clover and-py at doxdesk.com
Mon Sep 21 14:49:50 CEST 2009


 > A middleware might re-decode the values if the `wsgi.uri_encoding` is
 > `iso-8859-1` and only then.

Seems like a mistake. If the middleware knows iso-8859-7 is in use, it 
would need to transcode the charset regardless of whether the 
initially-submitted bytes were a valid UTF-8 sequence or not. Otherwise 
the application would break when fed with eg. Greek words that happened 
to encode to valid UTF-8 bytes.

 > The application MUST use this value to decode the ``'QUERY_STRING'``
 > as well.

This will break all use of non-UTF-8 encodings in QUERY_STRING, where 
the path part of the URL does not contain non-UTF-8 sequences. That 
includes the very common case where the path part contains only ASCII.

     http://greek.example.com/myscript.cgi?x=%C2

will fail, as the given UTF-8 sniffer only looks at the path part to 
determine what encoding to use for both of the path part and the query 
string. I don't think WSGI should mandate any particular decoding of the 
QUERY_STRING.

To be honest, I'm still uncomfortable with any use of Unicode strings in 
WSGI. But if we're going to do it, I'd go for consistency. Treating the 
decoding of the URL specially is a nasty hack that is only there because 
the CGI spec stupidly requires %-decoding to be done on PATH_INFO and 
SCRIPT_NAME.

So why not go with (the long-ago suggested) optional variables like 
'wsgi.real_path_info' that, if present, are the original strings before 
%-decoding? Now it doesn't greatly matter what string types and 
encodings we pick, because everything will be ASCII anyway. It also 
solves the %2F problem.

If those variables are not present (typically for CGI environments that 
cannot provide them), the application/framework *may* try recover 
non-ASCII characters from PATH_INFO/QUERY_STRING, with undefined 
results. This is the broken-but-sometimes-rescuable status quo for CGI: 
by the time Python reads non-ASCII characters out of the environment 
they may already have been mangled by up to two conversion processes.

-- 
And Clover
mailto:and at doxdesk.com
http://www.doxdesk.com/



More information about the Web-SIG mailing list