[Web-SIG] urllib.unquote in paste.httpserver prevents slashes in path segments

Ian Bicking ianb at colorstudy.com
Fri Mar 18 00:25:11 CET 2011


I'll just add that *if* you can design your URL space (you didn't just
inherit one), and you want to distinguish path segments from values that
contain '/', you can use URLs like:
  /item/{some/value}/view

And then use the matching {}'s to figure out that "some/value" is one path
segment.  This makes it possible, for instance, to use GData (where XML
namespaces can show up in the URL, and they contain /'s, but they need to be
treated as a single value).  It's not perfect, but it does work.


On Thu, Mar 17, 2011 at 4:02 PM, And Clover <and-py at doxdesk.com> wrote:

> On Thu, 2011-03-17 at 19:10 +0100, Florian Friesdorf wrote:
> > I think paste.httpserver.WSGIHandlerMixin.wsgi_setup should not
> > urllib.unquote the path before setting it in the wsgi environment
>
> I'm afraid it must. This is something the WSGI specification inherits
> from CGI.
>
> Yes, it was a terrible decision to have SCRIPT_NAME and PATH_INFO
> automatically unescaped, as it loses the distinction between ‘%2F’ and
> ‘/’, and has resulted in endless problems with non-ASCII characters that
> could otherwise been handled perfectly well as %-sequences.
>
> But that decision was taken a couple of decades ago and there's not
> really much we can do about it now. CGI may be an anachronism, but it is
> still widely used and its assumptions are still felt through Apache, IIS
> and WSGI.
>
> > By urllib.unquoting it is not possible to
> > have urllib.quoted slashes within one path segment.
>
> Correct. And neither Apache nor IIS allows %2F to be used within a path
> segment either, so really if you want to write a portable web app you
> simply have to avoid them (along with %00 and %5C). It is not currently
> practical to include any arbitrary byte sequence in a URL path segment,
> even though by the URL specification you should be able to.
>
> It's annoying, it's inelegant, it's limiting. But none of our attempts
> to extend or replace it for non-CGI-based servers (see past list
> discussion on path-info-raw or standardising REQUEST_URI) have come to
> any acceptable conclusion. We are stuck with it for the foreseeable.
>
> --
> And Clover
> mailto:and at doxdesk.com
> http://www.doxdesk.com
> gtalk:chat?jid=bobince at gmail.com
>
> _______________________________________________
> Web-SIG mailing list
> Web-SIG at python.org
> Web SIG: http://www.python.org/sigs/web-sig
> Unsubscribe:
> http://mail.python.org/mailman/options/web-sig/ianb%40colorstudy.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/web-sig/attachments/20110317/059e0c4c/attachment-0001.html>


More information about the Web-SIG mailing list