[Web-SIG] URL quoting in WSGI (or the lack therof)
brian at briansmith.org
Tue Jan 22 17:44:43 CET 2008
Luis Bruno wrote:
> Ian Bicking wrote:
> > But relating REQUEST_URI with SCRIPT_NAME/PATH_INFO is awkward and
> > having the information in duplicate places can lead to errors and
> > unclear situations if they don't match up properly.
I don't understand this argument. WSGI gateways just need to parse the
request URL correctly, and then everything *will* match up correctly,
AFAICT. Providing an undecoded REQUEST_URI that an application can parse
on its own is much better than what CherryPy is doing, and it is useful
for other reasons as well.
> I'm going with CherryPy's on this: don't decode "%2F".
CherryPy is not implementing the WSGI 1.0 specification correctly. And,
CherryPy's behavior here is harmful, because applications have no way of
knowing whether "%2F" is an un-decoded slash, or a literal "%2F".
> > Luis Bruno wrote:
> >> I was not amused to see egg:Paste#http urldecoding the
> >> whole PATH_INFO.
> > Unfortunately this is in the WSGI spec, so it's not
> > Paste#http so much as WSGI that demands this.
> I skimmed PEP 333 before grumbling and I've just re-read it;
> didn't find it, unless you're referring to the code in "URL
> Reconstruction" section.
> If you're referring[*] to the CGI 1.1 draft linked in "environ
> Variables", I think it supports my position that unquoting(PATH_INFO)
> was not the correct thing to do.
PEP 333 defers the definition of PATH_INFO to the CGI specification:
"The environ dictionary is required to contain these CGI environment
variables, as defined by the Common Gateway Interface specification
". That version of the CGI specification clearly expects PATH_INFO be
to decoded. Section 3.2 says "'enc-path-info' is a URL-encoded version
of PATH_INFO". The implication is that PATH_INFO is *not* URL-encoded.
Section 6.1.6 is more explicit, saying: "The syntax and semantics are
similar to a decoded HTTP URL 'path' token (defined in RFC 2396 ),
with the exception that a PATH_INFO of "/" represents a single void path
Furthermore, the URL reconstruction section and the CGI WSGI gateway
both also imply that PATH_INFO has already been decoded.
> > [/Laptops/LN500%2F9DW/ ] would be the Right Thing, except for not
> > being WSGI.
> Looks to me like a good candidate for an amendment.
> What's the next step?
Something so fundemantal as this cannot be changed with a simple
ammendment to the existing specification. Such a change would break
currently-conforming gateways and applications. An ammendment that
recommends, but does not require, REQUEST_URI is a much better option.
More information about the Web-SIG