From flo at chaoflow.net Thu Mar 17 19:10:22 2011 From: flo at chaoflow.net (Florian Friesdorf) Date: Thu, 17 Mar 2011 19:10:22 +0100 Subject: [Web-SIG] urllib.unquote in paste.httpserver prevents slashes in path segments Message-ID: <87tyf139f5.fsf@eve.chaoflow.net> I think paste.httpserver.WSGIHandlerMixin.wsgi_setup should not urllib.unquote the path [1] before setting it in the wsgi environment [2]. The only pre-processing performed on the path between [1] and [2] is concerned with slashes '/'. By urllib.unquoting it is not possible to have urllib.quoted slashes within one path segment. At least pyramid without routing fully relies on ``environ['PATH_INFO']`` [3]; by commenting [1] I succeeded to have slashes in path segments, they are handle by pyramid in [4]f. However, webob.request.BaseRequest would need to be adjusted wherever PATH_INFO from the environment is used (e.g [5]). Reasoning: The path stored in environ['PATH_INFO'] is still a path, therefore it must not be urllib.unquoted, the unquoting must happen after the path is split up in segments ([4]). [1] https://bitbucket.org/ianb/paste/src/4f5cfde87603/paste/httpserver.py#cl-180 [2] https://bitbucket.org/ianb/paste/src/4f5cfde87603/paste/httpserver.py#cl-217 [3] https://github.com/Pylons/pyramid/blob/master/pyramid/traversal.py#L594 [4] https://github.com/Pylons/pyramid/blob/master/pyramid/traversal.py#L495 [5] https://bitbucket.org/ianb/webob/src/c0bb5309cfca/webob/request.py#cl-265 -- Florian Friesdorf GPG FPR: 7A13 5EEE 1421 9FC2 108D BAAF 38F8 99A3 0C45 F083 Jabber/XMPP: flo at chaoflow.net IRC: chaoflow on freenode,ircnet,blafasel,OFTC -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 835 bytes Desc: not available URL: From ianb at colorstudy.com Thu Mar 17 21:10:56 2011 From: ianb at colorstudy.com (Ian Bicking) Date: Thu, 17 Mar 2011 15:10:56 -0500 Subject: [Web-SIG] urllib.unquote in paste.httpserver prevents slashes in path segments In-Reply-To: <87tyf139f5.fsf@eve.chaoflow.net> References: <87tyf139f5.fsf@eve.chaoflow.net> Message-ID: It's implied by WSGI itself that the path be unquoted; there's no fix short of changing the specification. On Thu, Mar 17, 2011 at 1:10 PM, Florian Friesdorf wrote: > > I think paste.httpserver.WSGIHandlerMixin.wsgi_setup should not > urllib.unquote the path [1] before setting it in the wsgi environment > [2]. The only pre-processing performed on the path between [1] and [2] > is concerned with slashes '/'. By urllib.unquoting it is not possible to > have urllib.quoted slashes within one path segment. > > At least pyramid without routing fully relies on > ``environ['PATH_INFO']`` [3]; by commenting [1] I succeeded to have > slashes in path segments, they are handle by pyramid in [4]f. > > However, webob.request.BaseRequest would need to be adjusted wherever > PATH_INFO from the environment is used (e.g [5]). > > Reasoning: The path stored in environ['PATH_INFO'] is still a path, > therefore it must not be urllib.unquoted, the unquoting must happen > after the path is split up in segments ([4]). > > [1] > https://bitbucket.org/ianb/paste/src/4f5cfde87603/paste/httpserver.py#cl-180 > [2] > https://bitbucket.org/ianb/paste/src/4f5cfde87603/paste/httpserver.py#cl-217 > [3] > https://github.com/Pylons/pyramid/blob/master/pyramid/traversal.py#L594 > [4] > https://github.com/Pylons/pyramid/blob/master/pyramid/traversal.py#L495 > [5] > https://bitbucket.org/ianb/webob/src/c0bb5309cfca/webob/request.py#cl-265 > > -- > Florian Friesdorf > GPG FPR: 7A13 5EEE 1421 9FC2 108D BAAF 38F8 99A3 0C45 F083 > Jabber/XMPP: flo at chaoflow.net > IRC: chaoflow on freenode,ircnet,blafasel,OFTC > > _______________________________________________ > Web-SIG mailing list > Web-SIG at python.org > Web SIG: http://www.python.org/sigs/web-sig > Unsubscribe: > http://mail.python.org/mailman/options/web-sig/ianb%40colorstudy.com > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From and-py at doxdesk.com Thu Mar 17 22:02:04 2011 From: and-py at doxdesk.com (And Clover) Date: Thu, 17 Mar 2011 21:02:04 +0000 Subject: [Web-SIG] urllib.unquote in paste.httpserver prevents slashes in path segments In-Reply-To: <87tyf139f5.fsf@eve.chaoflow.net> References: <87tyf139f5.fsf@eve.chaoflow.net> Message-ID: <1300395724.2060.14.camel@stalk> On Thu, 2011-03-17 at 19:10 +0100, Florian Friesdorf wrote: > I think paste.httpserver.WSGIHandlerMixin.wsgi_setup should not > urllib.unquote the path before setting it in the wsgi environment I'm afraid it must. This is something the WSGI specification inherits from CGI. Yes, it was a terrible decision to have SCRIPT_NAME and PATH_INFO automatically unescaped, as it loses the distinction between ?%2F? and ?/?, and has resulted in endless problems with non-ASCII characters that could otherwise been handled perfectly well as %-sequences. But that decision was taken a couple of decades ago and there's not really much we can do about it now. CGI may be an anachronism, but it is still widely used and its assumptions are still felt through Apache, IIS and WSGI. > By urllib.unquoting it is not possible to > have urllib.quoted slashes within one path segment. Correct. And neither Apache nor IIS allows %2F to be used within a path segment either, so really if you want to write a portable web app you simply have to avoid them (along with %00 and %5C). It is not currently practical to include any arbitrary byte sequence in a URL path segment, even though by the URL specification you should be able to. It's annoying, it's inelegant, it's limiting. But none of our attempts to extend or replace it for non-CGI-based servers (see past list discussion on path-info-raw or standardising REQUEST_URI) have come to any acceptable conclusion. We are stuck with it for the foreseeable. -- And Clover mailto:and at doxdesk.com http://www.doxdesk.com gtalk:chat?jid=bobince at gmail.com From ianb at colorstudy.com Fri Mar 18 00:25:11 2011 From: ianb at colorstudy.com (Ian Bicking) Date: Thu, 17 Mar 2011 18:25:11 -0500 Subject: [Web-SIG] urllib.unquote in paste.httpserver prevents slashes in path segments In-Reply-To: <1300395724.2060.14.camel@stalk> References: <87tyf139f5.fsf@eve.chaoflow.net> <1300395724.2060.14.camel@stalk> Message-ID: I'll just add that *if* you can design your URL space (you didn't just inherit one), and you want to distinguish path segments from values that contain '/', you can use URLs like: /item/{some/value}/view And then use the matching {}'s to figure out that "some/value" is one path segment. This makes it possible, for instance, to use GData (where XML namespaces can show up in the URL, and they contain /'s, but they need to be treated as a single value). It's not perfect, but it does work. On Thu, Mar 17, 2011 at 4:02 PM, And Clover wrote: > On Thu, 2011-03-17 at 19:10 +0100, Florian Friesdorf wrote: > > I think paste.httpserver.WSGIHandlerMixin.wsgi_setup should not > > urllib.unquote the path before setting it in the wsgi environment > > I'm afraid it must. This is something the WSGI specification inherits > from CGI. > > Yes, it was a terrible decision to have SCRIPT_NAME and PATH_INFO > automatically unescaped, as it loses the distinction between ?%2F? and > ?/?, and has resulted in endless problems with non-ASCII characters that > could otherwise been handled perfectly well as %-sequences. > > But that decision was taken a couple of decades ago and there's not > really much we can do about it now. CGI may be an anachronism, but it is > still widely used and its assumptions are still felt through Apache, IIS > and WSGI. > > > By urllib.unquoting it is not possible to > > have urllib.quoted slashes within one path segment. > > Correct. And neither Apache nor IIS allows %2F to be used within a path > segment either, so really if you want to write a portable web app you > simply have to avoid them (along with %00 and %5C). It is not currently > practical to include any arbitrary byte sequence in a URL path segment, > even though by the URL specification you should be able to. > > It's annoying, it's inelegant, it's limiting. But none of our attempts > to extend or replace it for non-CGI-based servers (see past list > discussion on path-info-raw or standardising REQUEST_URI) have come to > any acceptable conclusion. We are stuck with it for the foreseeable. > > -- > And Clover > mailto:and at doxdesk.com > http://www.doxdesk.com > gtalk:chat?jid=bobince at gmail.com > > _______________________________________________ > Web-SIG mailing list > Web-SIG at python.org > Web SIG: http://www.python.org/sigs/web-sig > Unsubscribe: > http://mail.python.org/mailman/options/web-sig/ianb%40colorstudy.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: From flo at chaoflow.net Fri Mar 18 10:36:27 2011 From: flo at chaoflow.net (Florian Friesdorf) Date: Fri, 18 Mar 2011 10:36:27 +0100 Subject: [Web-SIG] urllib.unquote in paste.httpserver prevents slashes in path segments In-Reply-To: References: <87tyf139f5.fsf@eve.chaoflow.net> Message-ID: <8739mk3h44.fsf@eve.chaoflow.net> On Thu, 17 Mar 2011 15:10:56 -0500, Ian Bicking wrote: > It's implied by WSGI itself that the path be unquoted; there's no fix short > of changing the specification. What is WSGI's solution for path segments containing slashes? > On Thu, Mar 17, 2011 at 1:10 PM, Florian Friesdorf wrote: > > > > > I think paste.httpserver.WSGIHandlerMixin.wsgi_setup should not > > urllib.unquote the path [1] before setting it in the wsgi environment > > [2]. The only pre-processing performed on the path between [1] and [2] > > is concerned with slashes '/'. By urllib.unquoting it is not possible to > > have urllib.quoted slashes within one path segment. > > > > At least pyramid without routing fully relies on > > ``environ['PATH_INFO']`` [3]; by commenting [1] I succeeded to have > > slashes in path segments, they are handle by pyramid in [4]f. > > > > However, webob.request.BaseRequest would need to be adjusted wherever > > PATH_INFO from the environment is used (e.g [5]). > > > > Reasoning: The path stored in environ['PATH_INFO'] is still a path, > > therefore it must not be urllib.unquoted, the unquoting must happen > > after the path is split up in segments ([4]). > > > > [1] > > https://bitbucket.org/ianb/paste/src/4f5cfde87603/paste/httpserver.py#cl-180 > > [2] > > https://bitbucket.org/ianb/paste/src/4f5cfde87603/paste/httpserver.py#cl-217 > > [3] > > https://github.com/Pylons/pyramid/blob/master/pyramid/traversal.py#L594 > > [4] > > https://github.com/Pylons/pyramid/blob/master/pyramid/traversal.py#L495 > > [5] > > https://bitbucket.org/ianb/webob/src/c0bb5309cfca/webob/request.py#cl-265 > > > > -- > > Florian Friesdorf > > GPG FPR: 7A13 5EEE 1421 9FC2 108D BAAF 38F8 99A3 0C45 F083 > > Jabber/XMPP: flo at chaoflow.net > > IRC: chaoflow on freenode,ircnet,blafasel,OFTC > > > > _______________________________________________ > > Web-SIG mailing list > > Web-SIG at python.org > > Web SIG: http://www.python.org/sigs/web-sig > > Unsubscribe: > > http://mail.python.org/mailman/options/web-sig/ianb%40colorstudy.com > > > > Non-text part: text/html -- Florian Friesdorf GPG FPR: 7A13 5EEE 1421 9FC2 108D BAAF 38F8 99A3 0C45 F083 Jabber/XMPP: flo at chaoflow.net IRC: chaoflow on freenode,ircnet,blafasel,OFTC -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 835 bytes Desc: not available URL: From flo at chaoflow.net Fri Mar 18 10:42:04 2011 From: flo at chaoflow.net (Florian Friesdorf) Date: Fri, 18 Mar 2011 10:42:04 +0100 Subject: [Web-SIG] urllib.unquote in paste.httpserver prevents slashes in path segments In-Reply-To: <8739mk3h44.fsf@eve.chaoflow.net> References: <87tyf139f5.fsf@eve.chaoflow.net> <8739mk3h44.fsf@eve.chaoflow.net> Message-ID: <87wrjw22ab.fsf@eve.chaoflow.net> On Fri, 18 Mar 2011 10:36:27 +0100, Florian Friesdorf wrote: > On Thu, 17 Mar 2011 15:10:56 -0500, Ian Bicking wrote: > > It's implied by WSGI itself that the path be unquoted; there's no fix short > > of changing the specification. > > What is WSGI's solution for path segments containing slashes? Please ignore this post - mail client played tricks on me and I did not see your further postings before writing this. > > On Thu, Mar 17, 2011 at 1:10 PM, Florian Friesdorf wrote: > > > > > > > > I think paste.httpserver.WSGIHandlerMixin.wsgi_setup should not > > > urllib.unquote the path [1] before setting it in the wsgi environment > > > [2]. The only pre-processing performed on the path between [1] and [2] > > > is concerned with slashes '/'. By urllib.unquoting it is not possible to > > > have urllib.quoted slashes within one path segment. > > > > > > At least pyramid without routing fully relies on > > > ``environ['PATH_INFO']`` [3]; by commenting [1] I succeeded to have > > > slashes in path segments, they are handle by pyramid in [4]f. > > > > > > However, webob.request.BaseRequest would need to be adjusted wherever > > > PATH_INFO from the environment is used (e.g [5]). > > > > > > Reasoning: The path stored in environ['PATH_INFO'] is still a path, > > > therefore it must not be urllib.unquoted, the unquoting must happen > > > after the path is split up in segments ([4]). > > > > > > [1] > > > https://bitbucket.org/ianb/paste/src/4f5cfde87603/paste/httpserver.py#cl-180 > > > [2] > > > https://bitbucket.org/ianb/paste/src/4f5cfde87603/paste/httpserver.py#cl-217 > > > [3] > > > https://github.com/Pylons/pyramid/blob/master/pyramid/traversal.py#L594 > > > [4] > > > https://github.com/Pylons/pyramid/blob/master/pyramid/traversal.py#L495 > > > [5] > > > https://bitbucket.org/ianb/webob/src/c0bb5309cfca/webob/request.py#cl-265 > > > > > > -- > > > Florian Friesdorf > > > GPG FPR: 7A13 5EEE 1421 9FC2 108D BAAF 38F8 99A3 0C45 F083 > > > Jabber/XMPP: flo at chaoflow.net > > > IRC: chaoflow on freenode,ircnet,blafasel,OFTC > > > > > > _______________________________________________ > > > Web-SIG mailing list > > > Web-SIG at python.org > > > Web SIG: http://www.python.org/sigs/web-sig > > > Unsubscribe: > > > http://mail.python.org/mailman/options/web-sig/ianb%40colorstudy.com > > > > > > > Non-text part: text/html > > -- > Florian Friesdorf > GPG FPR: 7A13 5EEE 1421 9FC2 108D BAAF 38F8 99A3 0C45 F083 > Jabber/XMPP: flo at chaoflow.net > IRC: chaoflow on freenode,ircnet,blafasel,OFTC Non-text part: application/pgp-signature -- Florian Friesdorf GPG FPR: 7A13 5EEE 1421 9FC2 108D BAAF 38F8 99A3 0C45 F083 Jabber/XMPP: flo at chaoflow.net IRC: chaoflow on freenode,ircnet,blafasel,OFTC -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 835 bytes Desc: not available URL: