[Web-SIG] Request for Comments on upcoming WSGI Changes
ianb at colorstudy.com
Tue Sep 22 07:09:17 CEST 2009
On Mon, Sep 21, 2009 at 6:16 PM, Graham Dumpleton <
graham.dumpleton at gmail.com> wrote:
> > Of course you can directly use `environ['some_key']` if you know you'll
> > get the 'right' encoding all the time. But when the encoding changes,
> > you'll have to fix all your middlewares.
> > I am missing something?
> For one, we aren't talking about arbitrary keys needing this treatment.
> We are only talking about SCRIPT_NAME and PATH_INFO.
OK, another proposal entirely: we kill SCRIPT_NAME and PATH_INFO, and
introduce two equivalent variables that hold the NOT url-decoded values. So
if you request /fran%e7cois then environ['PATH_INFO_RAW'] is '/fran%e7cois'.
This will be quite disruptive, as these are variables that are frequently
accessed directly (libraries that expose them as attributes can just turn
them into properties that do URL decoding, using UTF8). But it's an easy
fix at least. I would actually want to specify that if we added this key,
we should disallow the old keys -- terrible confusion could ensue from both
in the environ. This also fixes the problem with not being able to
distinguish %2F from /, which isn't a big problem but is annoying, and is
hiding meaningful information. (I believe the relevant spec does
distinguish between these two values -- i.e., ideally decoding should happen
on path segments, each segment separated by a real /.)
If we do that, then the only really tricky thing left is HTTP_COOKIE, and
since the Cookie header is a mess then HTTP_COOKIE will be a mess and we
just have to figure out a hacky way to deal with that. Maybe
surrogateescape, but probably just Latin1 would be fine (and easy to do in
Ian Bicking | http://blog.ianbicking.org |
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Web-SIG