[Web-SIG] Request for Comments on upcoming WSGI Changes
graham.dumpleton at gmail.com
Tue Sep 22 07:21:13 CEST 2009
2009/9/22 Ian Bicking <ianb at colorstudy.com>:
> On Mon, Sep 21, 2009 at 6:16 PM, Graham Dumpleton
> <graham.dumpleton at gmail.com> wrote:
>> > Of course you can directly use `environ['some_key']` if you know you'll
>> > get the 'right' encoding all the time. But when the encoding changes,
>> > you'll have to fix all your middlewares.
>> > I am missing something?
>> For one, we aren't talking about arbitrary keys needing this treatment.
>> We are only talking about SCRIPT_NAME and PATH_INFO.
> OK, another proposal entirely: we kill SCRIPT_NAME and PATH_INFO, and
> introduce two equivalent variables that hold the NOT url-decoded values. So
> if you request /fran%e7cois then environ['PATH_INFO_RAW'] is '/fran%e7cois'.
> This will be quite disruptive, as these are variables that are frequently
> accessed directly (libraries that expose them as attributes can just turn
> them into properties that do URL decoding, using UTF8). But it's an easy
> fix at least. I would actually want to specify that if we added this key,
> we should disallow the old keys -- terrible confusion could ensue from both
> in the environ. This also fixes the problem with not being able to
> distinguish %2F from /, which isn't a big problem but is annoying, and is
> hiding meaningful information. (I believe the relevant spec does
> distinguish between these two values -- i.e., ideally decoding should happen
> on path segments, each segment separated by a real /.)
> If we do that, then the only really tricky thing left is HTTP_COOKIE, and
> since the Cookie header is a mess then HTTP_COOKIE will be a mess and we
> just have to figure out a hacky way to deal with that. Maybe
> surrogateescape, but probably just Latin1 would be fine (and easy to do in
> Python 2).
That may be fine for pure Python web servers where you control the
split of REQUEST_URI into SCRIPT_NAME and PATH_INFO in the first place
but don't have that luxury in Apache or via FASTCGI/SCGI/CGI etc as
that is done by the web server. Also, as pointed out in my blog,
because of rewrites in web server, it may be difficult to try and map
SCRIPT_NAME and PATH_INFO back into REQUEST_URI provided to try and
reclaim original characters. There is also the problem that often
FASTCGI totally stuffs up SCRIPT_NAME/PATH_INFO split anyway and
manual overrides needed to tweak them.
More information about the Web-SIG