[Web-SIG] Draft PEP: WSGI 1.1
And Clover
and-py at doxdesk.com
Thu Apr 15 17:30:59 CEST 2010
Dirkjan Ochtman wrote:
> 1. The application is passed an instance of a Python dictionary
> containing what is referred to as the WSGI environment. All keys
> in this dictionary are native strings. For CGI variables, all names
> are going to be ISO-8859-1 and so where native strings are
> unicode strings, that encoding is used for the names of CGI
> variables.
Perhaps explain where those ISO-8859-1 bytes might come from:
...are native strings. Where native strings are Unicode, any
keys derived from byte-oriented sources (such as custom headers
in the HTTP request reflected in the CGI environment variables)
should be decoded using the ISO-8859-1 encoding.
> 3. For the CGI variables contained in the WSGI environment, the values
> of the variables are native strings. Where native strings are
> unicode strings, ISO-8859-1 encoding would be used such that the
> original character data is preserved and as necessary the unicode
> string can be converted back to bytes and thence decoded to unicode
> again using a different encoding.
Good. The only problem that remains with this is that in certain
environments (notably: all IIS use, not just CGI) a WSGI gateway cannot
fully comply with this requirement.
a. disallow environments that cannot be sure they are preserving the
original byte data from declaring that they support wsgi.version 1.1?
b. add an extra wsgi.something flag for a WSGI server to add, to specify
that it is sure that the original bytes have been preserved? (ie. so
wsgiref's CGI handler would have to declare it wasn't sure when running
under Windows.)
c. just let WSGI gateways silently ignore the ISO-8859-1 requirement if
they can't honour it and let the application spend its time trying to
unravel the mess (status quo).
(Can wsgiref be fixed to use ISO-8859-1 in time for Python 3.2?)
> 7. The iterable returned by the application and from which response
> content is derived, should yield byte strings. Where native strings
> are unicode strings, the native string type can also be returned in
> which case it would be encoded as ISO-8859-1.
> 8. The value passed to the 'write()' callback returned by
> 'start_response()' should be a byte string. Where native strings
> are unicode strings, a native string type can also be supplied, in
> which case it would be encoded as ISO-8859-1.
Weren't we going to only allow US-ASCII for the output? (These threads
are always so far apart I can never remember what conclusion we
reached... if any.)
Whilst ISO-8859-1 is in the HTTP standard for headers, and required to
preserve bytes in input, it's much, much less likely that the response
body is going to be ISO-8859-1. It could maybe be cp1252, but more
likely the author wanted UTF-8.
If we must support Unicode strings for response body output at all, I'd
prefer to be conservative here and spit a UnicodeEncodeError straight
away, rather than quietly mangle characters U+0080 to U+00FF.
Manlio Perillo wrote:
> The run_with_cgi sample function should be changed, since it probably
> does not work correctly, on Python 3.x.
Yes, the 'URL Reconstruction' fragment will be wrong too, since it uses
urllib.quote() to encode the path part. quote() defaults to UTF-8 rather
than the ISO-8859-1 WSGI 1.1 requires.
--
And Clover
mailto:and at doxdesk.com
http://www.doxdesk.com/
More information about the Web-SIG
mailing list