From graham.dumpleton at gmail.com Thu Jun 12 10:02:38 2008 From: graham.dumpleton at gmail.com (Graham Dumpleton) Date: Thu, 12 Jun 2008 18:02:38 +1000 Subject: [Web-SIG] Newline values in WSGI response header values. Message-ID: <88e286470806120102qa6b3f44k7fe5731e45dad962@mail.gmail.com> Can anyone confirm for me what the behaviour should be if someone includes a newline in the value of a WSGI response header? CGI specification would seem to disallow it and thus WSGI adapter should by rights possibly produce an error if user code does it. At the moment I know of no WSGI adapter implementation which validates whether a newline appears in the value of a WSGI response header. For many WSGI adapters this means that a header of: Key1: "Value1\r\nKey2: Value2" will actually translate into two separate headers being sent back to client. For a header of: Key3: "Value3a\r\nValue3b" in a WSGI adapter which simply passes things through, the client would get an invalid header line, which in general it would ignore. If however this was generated when hosted with a CGI-WSGI adapter, for Apache at least, Apache would generate a 500 error itself due to detected a header line of invalid format. Thus, is an embedded newline in value invalid? Would it be reasonable for a WSGI adapter to flag it as an error? Thanks. Graham From sh at defuze.org Thu Jun 12 10:22:51 2008 From: sh at defuze.org (Sylvain Hellegouarch) Date: Thu, 12 Jun 2008 10:22:51 +0200 (CEST) Subject: [Web-SIG] Newline values in WSGI response header values. In-Reply-To: <88e286470806120102qa6b3f44k7fe5731e45dad962@mail.gmail.com> References: <88e286470806120102qa6b3f44k7fe5731e45dad962@mail.gmail.com> Message-ID: <57133.195.101.247.164.1213258971.squirrel@mail1.webfaction.com> > Can anyone confirm for me what the behaviour should be if someone > includes a newline in the value of a WSGI response header? > > CGI specification would seem to disallow it and thus WSGI adapter > should by rights possibly produce an error if user code does it. > > At the moment I know of no WSGI adapter implementation which validates > whether a newline appears in the value of a WSGI response header. For > many WSGI adapters this means that a header of: > > Key1: "Value1\r\nKey2: Value2" > > will actually translate into two separate headers being sent back to > client. > > For a header of: > > Key3: "Value3a\r\nValue3b" > > in a WSGI adapter which simply passes things through, the client would > get an invalid header line, which in general it would ignore. If > however this was generated when hosted with a CGI-WSGI adapter, for > Apache at least, Apache would generate a 500 error itself due to > detected a header line of invalid format. > > Thus, is an embedded newline in value invalid? Would it be reasonable > for a WSGI adapter to flag it as an error? > I might be reading the spec wrong but it doesn't seem to be forbidden by RFC 2616. Section 4.2 says: > Any LWS that occurs between field-content MAY be replaced with a single SP before interpreting the field value or forwarding the message downstream. Then a look at the definition of separators shows us that SP is a valid separator. Since section 2.1 tells: > Except where noted otherwise, linear white space (LWS) can be included between any two adjacent words (token or quoted-string), and between adjacent words and separators, without changing the interpretation of a field. It sounds to me that this is a valid construct but a WSGI adapter might consider converting those CRLF into simple SP as said in 2.1 again: > A recipient MAY replace any linear white space with a single SP before interpreting the field value or forwarding the message downstream. - Sylvain -- Sylvain Hellegouarch http://www.defuze.org From graham.dumpleton at gmail.com Thu Jun 12 10:38:17 2008 From: graham.dumpleton at gmail.com (Graham Dumpleton) Date: Thu, 12 Jun 2008 18:38:17 +1000 Subject: [Web-SIG] Newline values in WSGI response header values. In-Reply-To: <57133.195.101.247.164.1213258971.squirrel@mail1.webfaction.com> References: <88e286470806120102qa6b3f44k7fe5731e45dad962@mail.gmail.com> <57133.195.101.247.164.1213258971.squirrel@mail1.webfaction.com> Message-ID: <88e286470806120138n6f77332cm22f563267b0e17f2@mail.gmail.com> 2008/6/12 Sylvain Hellegouarch : > >> Can anyone confirm for me what the behaviour should be if someone >> includes a newline in the value of a WSGI response header? >> >> CGI specification would seem to disallow it and thus WSGI adapter >> should by rights possibly produce an error if user code does it. >> >> At the moment I know of no WSGI adapter implementation which validates >> whether a newline appears in the value of a WSGI response header. For >> many WSGI adapters this means that a header of: >> >> Key1: "Value1\r\nKey2: Value2" >> >> will actually translate into two separate headers being sent back to >> client. >> >> For a header of: >> >> Key3: "Value3a\r\nValue3b" >> >> in a WSGI adapter which simply passes things through, the client would >> get an invalid header line, which in general it would ignore. If >> however this was generated when hosted with a CGI-WSGI adapter, for >> Apache at least, Apache would generate a 500 error itself due to >> detected a header line of invalid format. >> >> Thus, is an embedded newline in value invalid? Would it be reasonable >> for a WSGI adapter to flag it as an error? >> > > I might be reading the spec wrong but it doesn't seem to be forbidden by > RFC 2616. > > Section 4.2 says: > >> Any LWS that occurs between field-content MAY be replaced with a single > SP before interpreting the field value or forwarding the message > downstream. > > Then a look at the definition of separators shows us that SP is a valid > separator. > > Since section 2.1 tells: > >> Except where noted otherwise, linear white space (LWS) can be included > between any two adjacent words (token or quoted-string), and between > adjacent words and separators, without changing the interpretation of a > field. > > It sounds to me that this is a valid construct but a WSGI adapter might > consider converting those CRLF into simple SP as said in 2.1 again: > >> A recipient MAY replace any linear white space with a single SP before > interpreting the field value or forwarding the message downstream. A LWS is: LWS = [CRLF] 1*( SP | HT ) Ie, not just a single CRLF, but a CRLF followed by a space or tab. Thus, can't just replace CRLF only with a space. Anyway, the wording of my question and reference to CGI was a bit wrong, as WSGI response headers are probably more governed by HTTP RFC. To clarify, what we really have is two cases, the first is return of a value with a valid LWS as specified by HTTP RFC. If the WSGI adapter is mapping direct to HTTP, then it can pass it straight through. If however the WSGI adapter hosts on top a interface with CGI like semantics, then it should translate LWS to single space as described. The second case is an embedded CRLF which isn't followed by space or tab and thus isn't a LWS. This is the case which causes problems and am asking whether it should be detected and flagged as an errornous response. Graham From sh at defuze.org Thu Jun 12 10:58:09 2008 From: sh at defuze.org (Sylvain Hellegouarch) Date: Thu, 12 Jun 2008 10:58:09 +0200 (CEST) Subject: [Web-SIG] Newline values in WSGI response header values. In-Reply-To: <88e286470806120138n6f77332cm22f563267b0e17f2@mail.gmail.com> References: <88e286470806120102qa6b3f44k7fe5731e45dad962@mail.gmail.com> <57133.195.101.247.164.1213258971.squirrel@mail1.webfaction.com> <88e286470806120138n6f77332cm22f563267b0e17f2@mail.gmail.com> Message-ID: <42418.195.101.247.164.1213261089.squirrel@mail1.webfaction.com> > 2008/6/12 Sylvain Hellegouarch : >> >>> Can anyone confirm for me what the behaviour should be if someone >>> includes a newline in the value of a WSGI response header? >>> >>> CGI specification would seem to disallow it and thus WSGI adapter >>> should by rights possibly produce an error if user code does it. >>> >>> At the moment I know of no WSGI adapter implementation which validates >>> whether a newline appears in the value of a WSGI response header. For >>> many WSGI adapters this means that a header of: >>> >>> Key1: "Value1\r\nKey2: Value2" >>> >>> will actually translate into two separate headers being sent back to >>> client. >>> >>> For a header of: >>> >>> Key3: "Value3a\r\nValue3b" >>> >>> in a WSGI adapter which simply passes things through, the client would >>> get an invalid header line, which in general it would ignore. If >>> however this was generated when hosted with a CGI-WSGI adapter, for >>> Apache at least, Apache would generate a 500 error itself due to >>> detected a header line of invalid format. >>> >>> Thus, is an embedded newline in value invalid? Would it be reasonable >>> for a WSGI adapter to flag it as an error? >>> >> >> I might be reading the spec wrong but it doesn't seem to be forbidden by >> RFC 2616. >> >> Section 4.2 says: >> >>> Any LWS that occurs between field-content MAY be replaced with a single >> SP before interpreting the field value or forwarding the message >> downstream. >> >> Then a look at the definition of separators shows us that SP is a valid >> separator. >> >> Since section 2.1 tells: >> >>> Except where noted otherwise, linear white space (LWS) can be included >> between any two adjacent words (token or quoted-string), and between >> adjacent words and separators, without changing the interpretation of a >> field. >> >> It sounds to me that this is a valid construct but a WSGI adapter might >> consider converting those CRLF into simple SP as said in 2.1 again: >> >>> A recipient MAY replace any linear white space with a single SP before >> interpreting the field value or forwarding the message downstream. > > A LWS is: > > LWS = [CRLF] 1*( SP | HT ) > > Ie, not just a single CRLF, but a CRLF followed by a space or tab. > > Thus, can't just replace CRLF only with a space. > > Anyway, the wording of my question and reference to CGI was a bit > wrong, as WSGI response headers are probably more governed by HTTP > RFC. > > To clarify, what we really have is two cases, the first is return of a > value with a valid LWS as specified by HTTP RFC. > > If the WSGI adapter is mapping direct to HTTP, then it can pass it > straight through. If however the WSGI adapter hosts on top a interface > with CGI like semantics, then it should translate LWS to single space > as described. > > The second case is an embedded CRLF which isn't followed by space or > tab and thus isn't a LWS. This is the case which causes problems and > am asking whether it should be detected and flagged as an errornous > response. > You might want to take the question to the HTTP-BIS charter and follow-up on that issue: http://tools.ietf.org/wg/httpbis/trac/ticket/30 - Sylvain -- Sylvain Hellegouarch http://www.defuze.org From pywebsig at xhaus.com Thu Jun 12 11:06:42 2008 From: pywebsig at xhaus.com (Alan Kennedy) Date: Thu, 12 Jun 2008 10:06:42 +0100 Subject: [Web-SIG] Newline values in WSGI response header values. In-Reply-To: <88e286470806120102qa6b3f44k7fe5731e45dad962@mail.gmail.com> References: <88e286470806120102qa6b3f44k7fe5731e45dad962@mail.gmail.com> Message-ID: <4a951aa00806120206s35d7d989v9c701ab2f582ca94@mail.gmail.com> [Graham] > Thus, is an embedded newline in value invalid? Would it be reasonable > for a WSGI adapter to flag it as an error? >From a security POV, it may be advisable for WSGI servers to *not* allow newlines in HTTP response headers; newlines in response headers may be the result of an application's failure to sanitise its inputs. http://en.wikipedia.org/wiki/HTTP_response_splitting Regards, Alan. From paul at boddie.org.uk Thu Jun 12 20:30:04 2008 From: paul at boddie.org.uk (Paul Boddie) Date: Thu, 12 Jun 2008 20:30:04 +0200 Subject: [Web-SIG] Web Talks at EuroPython 2008 Message-ID: <200806122030.04096.paul@boddie.org.uk> Hello again, Following up on my previous mail about EuroPython 2008 (the European Python community conference), the organisers have now made the conference timetable available, and there are quite a few interesting talks of relevance to Python-oriented Web developers: Django, Grok, LAX (Logilab AppEngine eXtension), Plone, Pylons and Zope 3 all get some coverage this year, with Jython also being shown as an option for Web application development and deployment. So, for anyone reading this in Europe (or with European travel plans next month), why not plan a trip to Vilnius, Lithuania if you haven't already done so? More details can be found on the EuroPython site: http://www.europython.org/ Talks will take place from Monday 7th July until Wednesday 9th July with sprints taking place afterwards until Saturday 12th July. I look forward to seeing some of you there! Paul From orsenthil at gmail.com Mon Jun 16 05:23:41 2008 From: orsenthil at gmail.com (O.R.Senthil Kumaran) Date: Mon, 16 Jun 2008 08:53:41 +0530 Subject: [Web-SIG] urllib package addressing PEP 3108 Message-ID: <20080616032340.GA16198@gmail.com> Hello All, According to PEP3108, the new urllib package will consists of request.py (urllib2.py and url handling functions from urllib (URLOpener, FancyURLOpener) and then parse.py ( urlparse.py and parsing related methods from urllib). http://bugs.python.org/issue2885 tracks the package creation. Current urllib.py exposes the following methods. __all__ = ["urlopen", "URLopener", "FancyURLopener", "urlretrieve", "urlcleanup", "quote", "quote_plus", "unquote", "unquote_plus", "urlencode", "url2pathname", "pathname2url", "splittag", "localhost", "thishost", "ftperrors", "basejoin", "unwrap", "splittype", "splithost", "splituser", "splitpasswd", "splitport", "splitnport", "splitquery", "splitattr", "splitvalue", "getproxies"] Now the task is to divide them into request.py and parse.py. 1) urlopen method. Both urllib.py and urllib2.py currently have this method, urllib one takes proxies as the last argument and urllib2 takes timeout as the last argument. How do we have both of them? My thought, have urllib2's urlopen, because it anyway provides the proxy handling through handlers and discard urllib's urlopen method. Comments please? Now, splitting the methods to request.py and parse.py request.py - urlopen (urllib2's), URLopener, FancyURLopener, urlretrieve, urlcleanup, localhost, thishost, ftperrors, getproxies. parse.py - quote, quote_plus, unquote, unquote_plus, urlencode, url2pathname, pathname2url, splittag,basejoin,unwrap,splittype,splithost, splituser, splitpasswd, splitport,splitnport,splitquery,splitattr, splitvalue This to me looks like a major split up of the module and will involve code changes across the two a lot. When deciding upon the PEP3108 for urllib package, was this the thought process? Is my split up theoretically correct? Do you have any suggestions? Thanks, Senthil -- O.R.Senthil Kumaran http://uthcode.sarovar.org From orsenthil at gmail.com Wed Jun 18 20:03:32 2008 From: orsenthil at gmail.com (O.R.Senthil Kumaran) Date: Wed, 18 Jun 2008 23:33:32 +0530 Subject: [Web-SIG] urllib package addressing PEP 3108 In-Reply-To: References: <20080616032340.GA16198@gmail.com> Message-ID: <20080618180331.GA3693@gmail.com> Hi Facundo, * Facundo Batista [2008-06-18 14:52:46]: > > I think Jeremy will handle this today... I got in touch with Jeremy and we both are working together. :-) Currently there are 4 urllib tests still failing. Trying to sort things out. We discussed upon the split-up and other details like single urlopen method. Things are working out good. Thanks, Senthil > > 1) urlopen method. Both urllib.py and urllib2.py currently have this method, > > urllib one takes proxies as the last argument and urllib2 takes timeout as the > > last argument. > > How do we have both of them? > > > > My thought, have urllib2's urlopen, because it anyway provides the proxy > > handling through handlers and discard urllib's urlopen method. > > > > Comments please? > > Which would be the drawback of accepting the proxies directly in the > urlopen() function? > > Right now, to use a proxy I do: > > proxy = urllib2.ProxyHandler({"http":"http://www.norealproxy.com:8080"}) > opener = urllib2.build_opener(proxy, urllib2.HTTPHandler) > urllib2.install_opener(opener) > def ericsson_urlopen(*args): > return urllib2.urlopen(*args) > > Maybe I could use the syntax of urllib.urlopen(), and that it > automatically to do that? We settled upon using urllib2's urlopen method. The difference between urllib's urlopen and urllib2's urlopen was, both returned add_info_url() objects with urlopen wrapping up http client response class and urllib2 wrapping it up in io.Buffered.Reader So, the settlement was: use urlopen from urllib2, but wrap it in http client class for the file like object so that things get handled for both. The bugs in test were mostly due to this and is being fixed. Thanks, Senthil > > Now, splitting the methods to request.py and parse.py > > > > request.py - urlopen (urllib2's), URLopener, FancyURLopener, urlretrieve, > > urlcleanup, localhost, thishost, ftperrors, getproxies. > > > > parse.py - quote, quote_plus, unquote, unquote_plus, urlencode, url2pathname, > > pathname2url, splittag,basejoin,unwrap,splittype,splithost, > > splituser, splitpasswd, splitport,splitnport,splitquery,splitattr, splitvalue > > +1 > > Regards, > > -- > . Facundo > > Blog: http://www.taniquetil.com.ar/plog/ > PyAr: http://www.python.org/ar/ -- O.R.Senthil Kumaran http://uthcode.sarovar.org From orsenthil at gmail.com Fri Jun 27 20:31:58 2008 From: orsenthil at gmail.com (O.R.Senthil Kumaran) Date: Sat, 28 Jun 2008 00:01:58 +0530 Subject: [Web-SIG] urlparse method behaviour when handing abs/rel urls Message-ID: <20080627183158.GA4644@gmail.com> At http://bugs.python.org/issue754016, there is a discussion wherein if a URL is given in a normal way to urlparse (For e.g. urlparse('www.python.org')), it parses it as a path rather than as the net_loc component as is the comman case with browsers. urlparse module tries to follow RFC 1808, where it is specified that: 2.4.3. Parsing the Network Location/Login If the parse string begins with a double-slash "//", then the substring of characters after the double-slash and up to, but not including, the next slash "/" character is the network location/login () of the URL. For treating the url as a path, the RFC specifies that after parsing, scheme, net_loc, parameters and query, whatever is left is path. 2.4.6. Parsing the Path After the above steps, all that is left of the parse string is the URL and the slash "/" that may precede it. So, when 'www.python.org' is not a scheme, net_loc (as per RFC), parameter or query, it is a path. This case looks absurd for 'www.python.org' but perfect for parsing relative urls like just 'a'. More over this makes sense when we have relative urls with parameters and query, for e.g.'g:h','?x' Now, the question comes as "How do we inform the users that if they want the net_loc of the url, they have to use // in the front". My suggestion is through the "Docs" and "Help" message. There is a discussion and suggestion on raising an Exception for cases when url does not start with '//'. As urlparse module is used for handling both absolute URLs as well as relative URLS, this suggestion IMHO, would break the urlparse handling of all relative urls. For e.g, Cases which are mentioned in the RFC 1808 (Section 5.1 Normal Examples). Another way to resolve this would be to break urlparse into two methods: urlparse.absparse() urlparse.relparse() and let the user decide what he wants. Please provide your suggestions on this. - Is the current method okay? - Do we feel need for absparse and relparse()? Thanks. Senthil -- O.R.Senthil Kumaran http://uthcode.sarovar.org From ianb at colorstudy.com Fri Jun 27 20:35:54 2008 From: ianb at colorstudy.com (Ian Bicking) Date: Fri, 27 Jun 2008 13:35:54 -0500 Subject: [Web-SIG] urlparse method behaviour when handing abs/rel urls In-Reply-To: <20080627183158.GA4644@gmail.com> References: <20080627183158.GA4644@gmail.com> Message-ID: <4865330A.3090206@colorstudy.com> O.R.Senthil Kumaran wrote: > At http://bugs.python.org/issue754016, there is a discussion wherein if a URL > is given in a normal way to urlparse (For e.g. urlparse('www.python.org')), it > parses it as a path rather than as the net_loc component as is the comman case > with browsers. Browsers interpret it as a path, e.g., python.org will not take you to www.python.org There are things like email clients that detect domain names and turn them into links, but detecting links in text is quite different from anything urlparse does. -- Ian Bicking : ianb at colorstudy.com : http://blog.ianbicking.org From orsenthil at gmail.com Fri Jun 27 21:01:08 2008 From: orsenthil at gmail.com (O.R.Senthil Kumaran) Date: Sat, 28 Jun 2008 00:31:08 +0530 Subject: [Web-SIG] urlparse method behaviour when handing abs/rel urls In-Reply-To: <4865330A.3090206@colorstudy.com> References: <20080627183158.GA4644@gmail.com> <4865330A.3090206@colorstudy.com> Message-ID: <20080627190108.GA4780@gmail.com> * scriptor Ian Bicking explico: > > At http://bugs.python.org/issue754016, there is a discussion wherein if a > > URL > > is given in a normal way to urlparse (For e.g. urlparse('www.python.org')), > > it > > parses it as a path rather than as the net_loc component as is the comman > > case > > with browsers. > > Browsers interpret it as a path, e.g., href="www.python.org">python.org will not take you to www.python.org > Yes, you are right. In that case, what urlparse is currently doing is same as what browser does. :) Surprise and I had forgot this! :) BTW, commonly when someone writes 'www.python.org', we tend to understand that he is referring to net_loc. Is it not? And also, when we type 'www.python.org' at Address Location in the Browser, it automatically gets translated to http://www.python.org as the full url and www.python.org becomes net_loc in this case. Should we consider this scenario? Thanks, Senthil -- O.R.Senthil Kumaran http://uthcode.sarovar.org From fdrake at gmail.com Fri Jun 27 21:16:38 2008 From: fdrake at gmail.com (Fred Drake) Date: Fri, 27 Jun 2008 15:16:38 -0400 Subject: [Web-SIG] urlparse method behaviour when handing abs/rel urls In-Reply-To: <20080627190108.GA4780@gmail.com> References: <20080627183158.GA4644@gmail.com> <4865330A.3090206@colorstudy.com> <20080627190108.GA4780@gmail.com> Message-ID: <9cee7ab80806271216m6a89a4e4u8eaa20d4a6a0519f@mail.gmail.com> On Fri, Jun 27, 2008 at 3:01 PM, O.R.Senthil Kumaran wrote: > BTW, commonly when someone writes 'www.python.org', we tend to understand that > he is referring to net_loc. Is it not? > And also, when we type 'www.python.org' at Address Location in the > Browser, it automatically gets translated to http://www.python.org as the full > url and www.python.org becomes net_loc in this case. There are two cases here: 1. Relative URLs in a context that has a base URL (inside a resource loaded from a URL, or in an (X)HTML document that includes a element). 2. Abreviated URLs in a user interface that implies no context with a base URL (like the browser's address bar). I'd suggest that these are completely different. urlsplit and urlparse support 1. If we want the second, that should be a separate function. It would be reasonable to add that to the urlparse module (urllib.parse in Python 3). -Fred -- Fred L. Drake, Jr. "Chaos is the score upon which reality is written." --Henry Miller From fumanchu at aminus.org Fri Jun 27 21:36:33 2008 From: fumanchu at aminus.org (Robert Brewer) Date: Fri, 27 Jun 2008 12:36:33 -0700 Subject: [Web-SIG] urlparse method behaviour when handing abs/rel urls In-Reply-To: <9cee7ab80806271216m6a89a4e4u8eaa20d4a6a0519f@mail.gmail.com> References: <20080627183158.GA4644@gmail.com> <4865330A.3090206@colorstudy.com><20080627190108.GA4780@gmail.com> <9cee7ab80806271216m6a89a4e4u8eaa20d4a6a0519f@mail.gmail.com> Message-ID: Fred Drake wrote: > On Fri, Jun 27, 2008 at 3:01 PM, O.R.Senthil Kumaran > wrote: > > BTW, commonly when someone writes 'www.python.org', we tend to > > understand that he is referring to net_loc. Is it not? > > And also, when we type 'www.python.org' at Address Location in the > > Browser, it automatically gets translated to http://www.python.org as > > the full url and www.python.org becomes net_loc in this case. > > There are two cases here: > > 1. Relative URLs in a context that has a base URL (inside a resource > loaded from a URL, or in an (X)HTML document that includes a > element). > > 2. Abreviated URLs in a user interface that implies no context with a > base URL (like the browser's address bar). > > I'd suggest that these are completely different. urlsplit and > urlparse support 1. If we want the second, that should be a separate > function. It would be reasonable to add that to the urlparse module > (urllib.parse in Python 3). There's even a 3rd case: HTTP's Request-URI. For example, '//path' must be treated as an abs_path consisting of two path_segments ['', 'path'], not a net_loc, since the Request_URI must be one of ("*" | absoluteURI | abs_path | authority). Robert Brewer fumanchu at aminus.org See http://www.cherrypy.org/browser/branches/815-urljoin/cherrypy/wsgiserver /__init__.py#L247 for an implementation. From orsenthil at gmail.com Sun Jun 29 13:32:53 2008 From: orsenthil at gmail.com (O.R.Senthil Kumaran) Date: Sun, 29 Jun 2008 17:02:53 +0530 Subject: [Web-SIG] urlparse method behaviour when handing abs/rel urls In-Reply-To: <9cee7ab80806271216m6a89a4e4u8eaa20d4a6a0519f@mail.gmail.com> References: <20080627183158.GA4644@gmail.com> <4865330A.3090206@colorstudy.com> <20080627190108.GA4780@gmail.com> <9cee7ab80806271216m6a89a4e4u8eaa20d4a6a0519f@mail.gmail.com> Message-ID: <20080629113253.GA3291@gmail.com> * scriptor Fred Drake, explico > 2. Abreviated URLs in a user interface that implies no context with a > base URL (like the browser's address bar). > > I'd suggest that these are completely different. urlsplit and > urlparse support 1. If we want the second, that should be a separate > function. It would be reasonable to add that to the urlparse module > (urllib.parse in Python 3). > Thanks for the clarification. That sums up the things. I seek a concensus on a need for a "Abreviated URL" handling function. Do we need this in urllib.parse/urlparse library? In that case the specifications of how this function should behave will need to be defined by us. One advantage I can see is, when people provide "abbreviated url", then the result of parsing it into path and netloc would be proper as per their (common held) expectations. Anything else? -- O.R.Senthil Kumaran http://uthcode.sarovar.org From orsenthil at gmail.com Sun Jun 29 13:43:03 2008 From: orsenthil at gmail.com (O.R.Senthil Kumaran) Date: Sun, 29 Jun 2008 17:13:03 +0530 Subject: [Web-SIG] urlparse method behaviour when handing abs/rel urls In-Reply-To: References: <20080627183158.GA4644@gmail.com> <9cee7ab80806271216m6a89a4e4u8eaa20d4a6a0519f@mail.gmail.com> Message-ID: <20080629114303.GA14128@gmail.com> * scriptor Robert Brewer, explico > > There's even a 3rd case: HTTP's Request-URI. For example, '//path' must > be treated as an abs_path consisting of two path_segments ['', 'path'], > not a net_loc, since the Request_URI must be one of ("*" | absoluteURI | > abs_path | authority). > > See > http://www.cherrypy.org/browser/branches/815-urljoin/cherrypy/wsgiserver > /__init__.py#L247 for an implementation. Thanks for passing on this note and the example. Gives an idea of changes required in urlparse modules for RFC2396 compliance -- O.R.Senthil Kumaran http://uthcode.sarovar.org From facundobatista at gmail.com Wed Jun 18 19:52:55 2008 From: facundobatista at gmail.com (Facundo Batista) Date: Wed, 18 Jun 2008 17:52:55 -0000 Subject: [Web-SIG] urllib package addressing PEP 3108 In-Reply-To: <20080616032340.GA16198@gmail.com> References: <20080616032340.GA16198@gmail.com> Message-ID: 2008/6/16 O.R.Senthil Kumaran : > (urllib2.py and url handling functions from urllib (URLOpener, FancyURLOpener) > and then parse.py ( urlparse.py and parsing related methods from urllib). > http://bugs.python.org/issue2885 tracks the package creation. I think Jeremy will handle this today... O.R., did you make some of this work? Can you help Jeremy somehow? > 1) urlopen method. Both urllib.py and urllib2.py currently have this method, > urllib one takes proxies as the last argument and urllib2 takes timeout as the > last argument. > How do we have both of them? > > My thought, have urllib2's urlopen, because it anyway provides the proxy > handling through handlers and discard urllib's urlopen method. > > Comments please? Which would be the drawback of accepting the proxies directly in the urlopen() function? Right now, to use a proxy I do: proxy = urllib2.ProxyHandler({"http":"http://www.norealproxy.com:8080"}) opener = urllib2.build_opener(proxy, urllib2.HTTPHandler) urllib2.install_opener(opener) def ericsson_urlopen(*args): return urllib2.urlopen(*args) Maybe I could use the syntax of urllib.urlopen(), and that it automatically to do that? > Now, splitting the methods to request.py and parse.py > > request.py - urlopen (urllib2's), URLopener, FancyURLopener, urlretrieve, > urlcleanup, localhost, thishost, ftperrors, getproxies. > > parse.py - quote, quote_plus, unquote, unquote_plus, urlencode, url2pathname, > pathname2url, splittag,basejoin,unwrap,splittype,splithost, > splituser, splitpasswd, splitport,splitnport,splitquery,splitattr, splitvalue +1 Regards, -- . Facundo Blog: http://www.taniquetil.com.ar/plog/ PyAr: http://www.python.org/ar/