[Web-SIG] Request for Comments on upcoming WSGI Changes
and-py at doxdesk.com
Tue Sep 22 18:07:14 CEST 2009
> Armin has fast asleep now, so my shift.
Heh. It's a multiple-man job keeping up with this monster thread!
> The URLs don't break.
Not in themselves. Just the language of the PEP implies that to fix them
up would contravene the spec:
>> The application MUST use [the encoding guess for PATH_INFO] to decode
>> the ``'QUERY_STRING'`` as well.
This isn't appropriate even as a SHOULD: the guessed encoding for
PATH_INFO is very likely to be wrong, in particular for cases where the
path was purely ASCII.
The application (or a library/framework acting on its behalf) should be
allowed to decode QUERY_STRING using whatever encoding it is expecting.
Disallowing using anything other than utf-8 (and iso-8859-1 in a very
unreliable way) makes it impossible to have queries in any other
encoding at all and still comply with the spec, which is undesirable.
If this sentence is removed, and `wsgi.uri_encoding` is guaranteed to be
a. definitive and reliable, or
I'm pretty much happy. What I don't want is that half the future-WSGI
servers/gateways decide they have to provide *some* value for
`wsgi.uri_encoding` even if they're not quite sure if it's the right
one. Then we're back to square one.
> if it is known that an application or some subset of
> URLs will always be receiving a request as non UTF-8, then it should
> employ code in those cases to always transcode it to the required
Yep, agreed. I think the PEP should clarify that; at the moment it is
saying that a transcode is something you should only do for the
iso-8859-1 case, but if you actually followed that advice you'd get
highly inconsistent results. Perhaps we're at cross-purposes as to what
exactly consistutes 'middleware'...
> The other fallback is that a specific WSGI server could elect to
> provide an option to not use 'UTF-8' as the first choice for decoding
I really, *really* hope this does not happen. That just brings us more
> Whether surrogateescape gives a better solution I have no idea at this
Yeah... I'm highly suspicious of surrogateescape in a web context and
personally my code will be deliberately filtering all such characters
out. I can see it being a possible way to smuggle unwanted sequences
(such as overlongs) through filters, potentially causing endless
security problems. But we'll see...
mailto:and at doxdesk.com
More information about the Web-SIG