[Web-SIG] Move to bless Graham's WSGI 1.1 as official spec

And Clover and-py at doxdesk.com
Thu Dec 3 19:35:14 CET 2009


Manlio Perillo wrote:

> However what about URI (that is, for PATH_INFO and the like)?
> For URI (if I remember correctly) the suggested encoding is UTF-8, so
> URLS should be decoded using

>   url.decode('utf-8', 'surrogateescape')

> Is this correct?

The currently-discussed proposal is ISO-8859-1, allowing the real bytes 
to be trivially extracted. This is consistent with the other headers and 
would be my preferred approach.

Python 3.1's wsgiref.simple_server, on the other hand, blindly uses 
urllib.unquote, which defaults to UTF-8 without surrogateescape, 
mangling any non-UTF-8 input.

I don't really care whether UTF-8+surrogateescape or ISO-8859-1 encoding 
is blessed. But *something* needs to be blessed. An encoding, an 
alternative undecoded path_info, both, something else... just *something*.

> Let's consider the `wsgiref.util.application_uri` function
> There is a potential problem, here, with the quote function.

Yes. wsgiref is broken in Python 3.1. Not quite as broken as it was in 
3.0, but still broken. Until we can come to a Pronouncement on what WSGI 
*is* in Python 3, it is meaningless anyway.

> Cookie data SHOULD be transparent to the server/gateway; however WSGI is
> going to assume that data is encoded in latin-1.

Yeah. This is no big deal because non-ASCII characters in cookies are 
already broken everywhere(*). Given this and other limitations on what 
characters can go in cookies, they are habitually encoded using ad-hoc 
mechanisms handled by the application (typically a round of URL-encoding).

*: in particular:

- Opera and Chrome send non-ASCII cookie characters in UTF-8.
- IE encodes using the system codepage (which can never be UTF-8),
   mangling any characters that don't fit in the codepage through the
   traditional Windows 'similar replacement character' scheme.
- Mozilla uses the low byte of each UTF-16 code point (so ISO-8859-1
   gets through but everything else is mangled)
- Safari refuses to send any cookie containing non-ASCII characters.

> I don't know what the HTTP/Cookie spec says about this.

The traditional interpretation of RFC2616 is that headers are ISO-8859-1.

You will notice that no browser correctly follows this.

...sigh.

-- 
And Clover
mailto:and at doxdesk.com
http://www.doxdesk.com/


-- 
And Clover
mailto:and at doxdesk.com
http://www.doxdesk.com/



More information about the Web-SIG mailing list