[Web-SIG] URL quoting in WSGI (or the lack therof)

Luis Bruno lbruno at 100blossoms.com
Mon Jan 21 12:06:27 CET 2008


I'll top post my "solution"; scare quoted because I'm still not sure 
this is the smartest idea:
environ['wsgiorg.path-segments'] = ['catalog', 'NEC', 'Computers', 
'Laptop', 'LN500/9DW']

Robert Brewer wrote:
> All HTTP URI are /-delimited, and any '/' appearing in a single segment
> that is not intended to participate in the hierarchy semantics must be
> %-encoded before transmitting it over HTTP.
I wholeheartedly agree. And your explanation is clearer than mine.
>> IMHO [changing CP's wsgiserver to do decoding] is the wrong answer    
> Why?
>   
Because then I'm stuck monkey patching every WSGI server (and/or stuck 
using my own URL dispatcher) so that I don't lose the information that 
one of the forward slashes is NOT a path delimiter. You said that 
%-encoding is meant for slashes not participating in hierarchy 
semantics, if I read you correctly; so I think you'll agree with me on this.
> You have to explain why you think the application should receive %XX encoded
> URI's instead of decoded ones. What's the benefit? I only see a con:
> every piece of middleware that cares has to repeat the decoding of
> PATH_INFO and SCRIPT_NAME, wasting CPU and memory.
>   
I was aware of this trade off, which is why I'm still not sure the 
application should receive the %-encoded URIs. My app was forced to 
split the URL on the '/' delimiters. If I can get the framework to do 
that job while dispatching, so much the better. Hence the solution I top 
posted. My problem rises when I output a link created from suitably 
%-encoding these path segments:

'/'.join(['NEC', 'Computers', 'Laptop', 'LN500/9DW'])

And after the user clicks that link, the framework gives me (and Routes 
has no way to avoid this when Paste is the one who's doing the whole 
path decoding):

['NEC', 'Computers', 'Laptop', 'LN500', '9DW']

Think dispatching to a ``callable(*segments, **urlvariables)``. I think 
we'll agree this is not what the app writer intended. And I'm out of 
luck if the WSGI server/dispatcher is the one doing the urldecoding.
> According to [1], the right answer is "yes":
>   
I'll see your CGI draft and raise you the URI spec[2]. When you've read 
the last sentence, you'll see how unoriginal the top posted solution was:
> 2.4.2. When to Escape and Unescape
>
> A URI is always in an "escaped" form, since escaping or unescaping a
> completed URI might change its semantics.  Normally, the only time
> escape encodings can safely be made is when the URI is being created
> from its component parts; each component may have its own set of
> characters that are reserved, so only the mechanism responsible for
> generating or interpreting that component can determine whether or
> not escaping a character will change its semantics. Likewise, a URI
> must be separated into its components before the escaped characters
> within those components can be safely decoded.
[1] http://cgi-spec.golux.com/draft-coar-cgi-v11-03-clean.html#6.1.6
[2] <URL:http://www.ietf.org/rfc/rfc2396.txt>. There is a CGI 
Informational RFC somewhere, which I've read diagonally coming here to 
grumble.

-- 
Luís Bruno


More information about the Web-SIG mailing list