[Web-SIG] URL quoting in WSGI (or the lack therof)
Luis Bruno
lbruno at 100blossoms.com
Mon Jan 21 12:06:27 CET 2008
I'll top post my "solution"; scare quoted because I'm still not sure
this is the smartest idea:
environ['wsgiorg.path-segments'] = ['catalog', 'NEC', 'Computers',
'Laptop', 'LN500/9DW']
Robert Brewer wrote:
> All HTTP URI are /-delimited, and any '/' appearing in a single segment
> that is not intended to participate in the hierarchy semantics must be
> %-encoded before transmitting it over HTTP.
I wholeheartedly agree. And your explanation is clearer than mine.
>> IMHO [changing CP's wsgiserver to do decoding] is the wrong answer
> Why?
>
Because then I'm stuck monkey patching every WSGI server (and/or stuck
using my own URL dispatcher) so that I don't lose the information that
one of the forward slashes is NOT a path delimiter. You said that
%-encoding is meant for slashes not participating in hierarchy
semantics, if I read you correctly; so I think you'll agree with me on this.
> You have to explain why you think the application should receive %XX encoded
> URI's instead of decoded ones. What's the benefit? I only see a con:
> every piece of middleware that cares has to repeat the decoding of
> PATH_INFO and SCRIPT_NAME, wasting CPU and memory.
>
I was aware of this trade off, which is why I'm still not sure the
application should receive the %-encoded URIs. My app was forced to
split the URL on the '/' delimiters. If I can get the framework to do
that job while dispatching, so much the better. Hence the solution I top
posted. My problem rises when I output a link created from suitably
%-encoding these path segments:
'/'.join(['NEC', 'Computers', 'Laptop', 'LN500/9DW'])
And after the user clicks that link, the framework gives me (and Routes
has no way to avoid this when Paste is the one who's doing the whole
path decoding):
['NEC', 'Computers', 'Laptop', 'LN500', '9DW']
Think dispatching to a ``callable(*segments, **urlvariables)``. I think
we'll agree this is not what the app writer intended. And I'm out of
luck if the WSGI server/dispatcher is the one doing the urldecoding.
> According to [1], the right answer is "yes":
>
I'll see your CGI draft and raise you the URI spec[2]. When you've read
the last sentence, you'll see how unoriginal the top posted solution was:
> 2.4.2. When to Escape and Unescape
>
> A URI is always in an "escaped" form, since escaping or unescaping a
> completed URI might change its semantics. Normally, the only time
> escape encodings can safely be made is when the URI is being created
> from its component parts; each component may have its own set of
> characters that are reserved, so only the mechanism responsible for
> generating or interpreting that component can determine whether or
> not escaping a character will change its semantics. Likewise, a URI
> must be separated into its components before the escaped characters
> within those components can be safely decoded.
[1] http://cgi-spec.golux.com/draft-coar-cgi-v11-03-clean.html#6.1.6
[2] <URL:http://www.ietf.org/rfc/rfc2396.txt>. There is a CGI
Informational RFC somewhere, which I've read diagonally coming here to
grumble.
--
Luís Bruno
More information about the Web-SIG
mailing list