[Web-SIG] WSGI Amendments thoughts: the horror of charsets
and-py at doxdesk.com
Fri Nov 14 22:23:35 CET 2008
Ian Bicking wrote:
> This is something messed up with CGI on NT, and whatever server you are
> using, and perhaps the CGI adapter (maybe there's a way to get the raw
> environment without any encoding, for example?)
Python decodes the environ to its own copy (wrapped in os.environ) at
interpreter startup time; there's no way to query the real ‘live’
environment that I know of. It'd require a C extension.
> Honestly I don't know if anyone is doing anything with
> WSGI and Python 3.
I know Graham has done some work on mod_wsgi for 3.0, but no, I don't
know anyone using it in anger.
Is it worth submitting patches to simple_server to make it run on 3.0?
Is it too late to include at this stage anyway? Shipping 3.0 with a
non-functional wsgiref is a bit embarrassing.
> I assume there is some way to get at the bytes in the environment, if not
> then that is a Python 3 bug.
There is not, and this appears to be deliberate.
> I think it might be feasible to support an encoded version of
> SCRIPT_NAME and PATH_INFO for WSGI 2.0 (creating entirely new key names,
> and I don't know of any particular standard to base those names on),
> moving from the two keys to a single REQUEST_URI is not feasible.
That's certainly a possibility, but I feel it's easier to hitch a ride
on the existing header, which despite being non-standard is still quite
> I guess you'd probably count segments, try to catch %2f (where the
> segments won't match up), and then double check that the decoded
> REQUEST_URI matches SCRIPT_NAME+PATH_INFO.
I'm currently testing with just the segment counting. It's only
necessary that the segments from SCRIPT_NAME are matched and stripped,
and those are extremely unlikely to contain ‘%2F’ because:
- there aren't many filesystems that can accept ‘/’ as a filename
character. RISC OS is the only one I can think of, and it by
convention swaps ‘/’ and ‘.’ to compensate as it is, so even
there you couldn't use ‘%2F’;
- there aren't many webservers that can map a file or alias to a
path containing ‘%2F’;
- no-one wants to mount a webapp alias at such a weird name — it's
only in the section corresponding to PATH_INFO that ‘%2F’ might
ever be of use in practice.
In the worst case, many applications already know and can strip the URL
at which they're mounted, but unless there's a legitimate ‘%2F’ in their
SCRIPT_NAME it doesn't actually matter.
> frankly IIS is probably less relevant to most developers than CGI.
You and I may not favour it, but it's ≈35% of the world out there, not
something we can afford to ignore IMO.
> So if IIS has problems with PATH_INFO, the WSGI adapter
> (be it CGI or otherwise) should be configured to fix those problems up
What I'm saying is that neither Apache's nor IIS's behaviour can be
considered clearly correct or wrong at this point, and there is no way a
WSGI adapter living underneath them *can* fix up the differences.
(There is an problem with PATH_INFO that a WSGI adapter *could* clear
up, which is that IIS makes PATH_INFO the entire path including
SCRIPT_NAME. I'm not sure whether it's worth fixing that up in the
adapter layer though... it's possible some frameworks are already
dealing with it, and might even be relying on it!)
mailto:and at doxdesk.com
More information about the Web-SIG