[Web-SIG] problem with wsgiref.util.request_uri and decoded uri

Andrew Clover and-py at doxdesk.com
Wed Sep 10 20:18:41 CEST 2008


Manlio Perillo wrote:

> On the other hand, if the WSGI gateway *do* decode the uri,
> I can no more handle '/' in uri. 

Correct. CGI requires that '%2F' is decoded, and hence indistinguishable 
from '/' when it gets to the application. And WSGI inherits CGI's flaws 
for compatibility.

request_uri is doing the right thing in assuming that if you got a '%40' 
in your PATH_INFO, it must originally have been a '%2540'.

It is an irritating limitation, but so far not irritating enough for an 
optional workaround to have made its way into non-CGI-based WSGI servers.

It may become a bigger irritation as we move to Py3K, and get stuck with 
decoded top-bit-set characters being turned into Unicode using the 
system encoding (which is likely to be wrong). Windows already suffers 
from similar problems as its environment variables are natively Unicode, 
and its system encoding is never UTF-8 (which is the most likely 
encoding for path parts).

> Where can I find informations about alternate encoding scheme?

It's easy enough to roll your own. For example htmlform uses a scheme of 
encoding path parts to '+XX' instead of '%XX'.

     encode_re= re.compile('[^-_.!~*()\'0-9a-zA-Z]')
     decode_re= re.compile(r'\+([0-9a-zA-Z][0-9a-zA-Z])')

     def encode(s):
         return encode_re.sub(lambda m: '+%02X' % (ord(m.group())), s)
     def decode(s):
         decode_re.sub(lambda m: chr(int(m.group(1),16)), s)

-- 
And Clover
mailto:and at doxdesk.com
http://www.doxdesk.com/


More information about the Web-SIG mailing list