From graham.dumpleton at gmail.com  Thu Jun 12 10:02:38 2008
From: graham.dumpleton at gmail.com (Graham Dumpleton)
Date: Thu, 12 Jun 2008 18:02:38 +1000
Subject: [Web-SIG] Newline values in WSGI response header values.
Message-ID: <88e286470806120102qa6b3f44k7fe5731e45dad962@mail.gmail.com>

Can anyone confirm for me what the behaviour should be if someone
includes a newline in the value of a WSGI response header?

CGI specification would seem to disallow it and thus WSGI adapter
should by rights possibly produce an error if user code does it.

At the moment I know of no WSGI adapter implementation which validates
whether a newline appears in the value of a WSGI response header. For
many WSGI adapters this means that a header of:

  Key1: "Value1\r\nKey2: Value2"

will actually translate into two separate headers being sent back to client.

For a header of:

  Key3: "Value3a\r\nValue3b"

in a WSGI adapter which simply passes things through, the client would
get an invalid header line, which in general it would ignore. If
however this was generated when hosted with a CGI-WSGI adapter, for
Apache at least, Apache would generate a 500 error itself due to
detected a header line of invalid format.

Thus, is an embedded newline in value invalid? Would it be reasonable
for a WSGI adapter to flag it as an error?

Thanks.

Graham

From sh at defuze.org  Thu Jun 12 10:22:51 2008
From: sh at defuze.org (Sylvain Hellegouarch)
Date: Thu, 12 Jun 2008 10:22:51 +0200 (CEST)
Subject: [Web-SIG] Newline values in WSGI response header values.
In-Reply-To: <88e286470806120102qa6b3f44k7fe5731e45dad962@mail.gmail.com>
References: <88e286470806120102qa6b3f44k7fe5731e45dad962@mail.gmail.com>
Message-ID: <57133.195.101.247.164.1213258971.squirrel@mail1.webfaction.com>


> Can anyone confirm for me what the behaviour should be if someone
> includes a newline in the value of a WSGI response header?
>
> CGI specification would seem to disallow it and thus WSGI adapter
> should by rights possibly produce an error if user code does it.
>
> At the moment I know of no WSGI adapter implementation which validates
> whether a newline appears in the value of a WSGI response header. For
> many WSGI adapters this means that a header of:
>
>   Key1: "Value1\r\nKey2: Value2"
>
> will actually translate into two separate headers being sent back to
> client.
>
> For a header of:
>
>   Key3: "Value3a\r\nValue3b"
>
> in a WSGI adapter which simply passes things through, the client would
> get an invalid header line, which in general it would ignore. If
> however this was generated when hosted with a CGI-WSGI adapter, for
> Apache at least, Apache would generate a 500 error itself due to
> detected a header line of invalid format.
>
> Thus, is an embedded newline in value invalid? Would it be reasonable
> for a WSGI adapter to flag it as an error?
>

I might be reading the spec wrong but it doesn't seem to be forbidden by
RFC 2616.

Section 4.2 says:

> Any LWS that occurs between field-content MAY be replaced with a single
SP before interpreting the field value or forwarding the message
downstream.

Then a look at the definition of separators shows us that SP is a valid
separator.

Since section 2.1 tells:

> Except where noted otherwise, linear white space (LWS) can be included
between any two adjacent words (token or quoted-string), and between
adjacent words and separators, without changing the interpretation of a
field.

It sounds to me that this is a valid construct but a WSGI adapter might
consider converting those CRLF into simple SP as said in 2.1 again:

> A recipient MAY replace any linear white space with a single SP before
interpreting the field value or forwarding the message downstream.


- Sylvain

-- 
Sylvain Hellegouarch
http://www.defuze.org

From graham.dumpleton at gmail.com  Thu Jun 12 10:38:17 2008
From: graham.dumpleton at gmail.com (Graham Dumpleton)
Date: Thu, 12 Jun 2008 18:38:17 +1000
Subject: [Web-SIG] Newline values in WSGI response header values.
In-Reply-To: <57133.195.101.247.164.1213258971.squirrel@mail1.webfaction.com>
References: <88e286470806120102qa6b3f44k7fe5731e45dad962@mail.gmail.com>
	<57133.195.101.247.164.1213258971.squirrel@mail1.webfaction.com>
Message-ID: <88e286470806120138n6f77332cm22f563267b0e17f2@mail.gmail.com>

2008/6/12 Sylvain Hellegouarch <sh at defuze.org>:
>
>> Can anyone confirm for me what the behaviour should be if someone
>> includes a newline in the value of a WSGI response header?
>>
>> CGI specification would seem to disallow it and thus WSGI adapter
>> should by rights possibly produce an error if user code does it.
>>
>> At the moment I know of no WSGI adapter implementation which validates
>> whether a newline appears in the value of a WSGI response header. For
>> many WSGI adapters this means that a header of:
>>
>>   Key1: "Value1\r\nKey2: Value2"
>>
>> will actually translate into two separate headers being sent back to
>> client.
>>
>> For a header of:
>>
>>   Key3: "Value3a\r\nValue3b"
>>
>> in a WSGI adapter which simply passes things through, the client would
>> get an invalid header line, which in general it would ignore. If
>> however this was generated when hosted with a CGI-WSGI adapter, for
>> Apache at least, Apache would generate a 500 error itself due to
>> detected a header line of invalid format.
>>
>> Thus, is an embedded newline in value invalid? Would it be reasonable
>> for a WSGI adapter to flag it as an error?
>>
>
> I might be reading the spec wrong but it doesn't seem to be forbidden by
> RFC 2616.
>
> Section 4.2 says:
>
>> Any LWS that occurs between field-content MAY be replaced with a single
> SP before interpreting the field value or forwarding the message
> downstream.
>
> Then a look at the definition of separators shows us that SP is a valid
> separator.
>
> Since section 2.1 tells:
>
>> Except where noted otherwise, linear white space (LWS) can be included
> between any two adjacent words (token or quoted-string), and between
> adjacent words and separators, without changing the interpretation of a
> field.
>
> It sounds to me that this is a valid construct but a WSGI adapter might
> consider converting those CRLF into simple SP as said in 2.1 again:
>
>> A recipient MAY replace any linear white space with a single SP before
> interpreting the field value or forwarding the message downstream.

A LWS is:

  LWS            = [CRLF] 1*( SP | HT )

Ie, not just a single CRLF, but a CRLF followed by a space or tab.

Thus, can't just replace CRLF only with a space.

Anyway, the wording of my question and reference to CGI was a bit
wrong, as WSGI response headers are probably more governed by HTTP
RFC.

To clarify, what we really have is two cases, the first is return of a
value with a valid LWS as specified by HTTP RFC.

If the WSGI adapter is mapping direct to HTTP, then it can pass it
straight through. If however the WSGI adapter hosts on top a interface
with CGI like semantics, then it should translate LWS to single space
as described.

The second case is an embedded CRLF which isn't followed by space or
tab and thus isn't a LWS. This is the case which causes problems and
am asking whether it should be detected and flagged as an errornous
response.

Graham

From sh at defuze.org  Thu Jun 12 10:58:09 2008
From: sh at defuze.org (Sylvain Hellegouarch)
Date: Thu, 12 Jun 2008 10:58:09 +0200 (CEST)
Subject: [Web-SIG] Newline values in WSGI response header values.
In-Reply-To: <88e286470806120138n6f77332cm22f563267b0e17f2@mail.gmail.com>
References: <88e286470806120102qa6b3f44k7fe5731e45dad962@mail.gmail.com>
	<57133.195.101.247.164.1213258971.squirrel@mail1.webfaction.com>
	<88e286470806120138n6f77332cm22f563267b0e17f2@mail.gmail.com>
Message-ID: <42418.195.101.247.164.1213261089.squirrel@mail1.webfaction.com>


> 2008/6/12 Sylvain Hellegouarch <sh at defuze.org>:
>>
>>> Can anyone confirm for me what the behaviour should be if someone
>>> includes a newline in the value of a WSGI response header?
>>>
>>> CGI specification would seem to disallow it and thus WSGI adapter
>>> should by rights possibly produce an error if user code does it.
>>>
>>> At the moment I know of no WSGI adapter implementation which validates
>>> whether a newline appears in the value of a WSGI response header. For
>>> many WSGI adapters this means that a header of:
>>>
>>>   Key1: "Value1\r\nKey2: Value2"
>>>
>>> will actually translate into two separate headers being sent back to
>>> client.
>>>
>>> For a header of:
>>>
>>>   Key3: "Value3a\r\nValue3b"
>>>
>>> in a WSGI adapter which simply passes things through, the client would
>>> get an invalid header line, which in general it would ignore. If
>>> however this was generated when hosted with a CGI-WSGI adapter, for
>>> Apache at least, Apache would generate a 500 error itself due to
>>> detected a header line of invalid format.
>>>
>>> Thus, is an embedded newline in value invalid? Would it be reasonable
>>> for a WSGI adapter to flag it as an error?
>>>
>>
>> I might be reading the spec wrong but it doesn't seem to be forbidden by
>> RFC 2616.
>>
>> Section 4.2 says:
>>
>>> Any LWS that occurs between field-content MAY be replaced with a single
>> SP before interpreting the field value or forwarding the message
>> downstream.
>>
>> Then a look at the definition of separators shows us that SP is a valid
>> separator.
>>
>> Since section 2.1 tells:
>>
>>> Except where noted otherwise, linear white space (LWS) can be included
>> between any two adjacent words (token or quoted-string), and between
>> adjacent words and separators, without changing the interpretation of a
>> field.
>>
>> It sounds to me that this is a valid construct but a WSGI adapter might
>> consider converting those CRLF into simple SP as said in 2.1 again:
>>
>>> A recipient MAY replace any linear white space with a single SP before
>> interpreting the field value or forwarding the message downstream.
>
> A LWS is:
>
>   LWS            = [CRLF] 1*( SP | HT )
>
> Ie, not just a single CRLF, but a CRLF followed by a space or tab.
>
> Thus, can't just replace CRLF only with a space.
>
> Anyway, the wording of my question and reference to CGI was a bit
> wrong, as WSGI response headers are probably more governed by HTTP
> RFC.
>
> To clarify, what we really have is two cases, the first is return of a
> value with a valid LWS as specified by HTTP RFC.
>
> If the WSGI adapter is mapping direct to HTTP, then it can pass it
> straight through. If however the WSGI adapter hosts on top a interface
> with CGI like semantics, then it should translate LWS to single space
> as described.
>
> The second case is an embedded CRLF which isn't followed by space or
> tab and thus isn't a LWS. This is the case which causes problems and
> am asking whether it should be detected and flagged as an errornous
> response.
>

You might want to take the question to the HTTP-BIS charter and follow-up
on that issue:

http://tools.ietf.org/wg/httpbis/trac/ticket/30

- Sylvain


-- 
Sylvain Hellegouarch
http://www.defuze.org

From pywebsig at xhaus.com  Thu Jun 12 11:06:42 2008
From: pywebsig at xhaus.com (Alan Kennedy)
Date: Thu, 12 Jun 2008 10:06:42 +0100
Subject: [Web-SIG] Newline values in WSGI response header values.
In-Reply-To: <88e286470806120102qa6b3f44k7fe5731e45dad962@mail.gmail.com>
References: <88e286470806120102qa6b3f44k7fe5731e45dad962@mail.gmail.com>
Message-ID: <4a951aa00806120206s35d7d989v9c701ab2f582ca94@mail.gmail.com>

[Graham]
> Thus, is an embedded newline in value invalid? Would it be reasonable
> for a WSGI adapter to flag it as an error?

>From a security POV, it may be advisable for WSGI servers to *not*
allow newlines in HTTP response headers; newlines in response headers
may be the result of an application's failure to sanitise its inputs.

http://en.wikipedia.org/wiki/HTTP_response_splitting

Regards,

Alan.

From paul at boddie.org.uk  Thu Jun 12 20:30:04 2008
From: paul at boddie.org.uk (Paul Boddie)
Date: Thu, 12 Jun 2008 20:30:04 +0200
Subject: [Web-SIG] Web Talks at EuroPython 2008
Message-ID: <200806122030.04096.paul@boddie.org.uk>

Hello again,

Following up on my previous mail about EuroPython 2008 (the European Python 
community conference), the organisers have now made the conference timetable 
available, and there are quite a few interesting talks of relevance to 
Python-oriented Web developers: Django, Grok, LAX (Logilab AppEngine 
eXtension), Plone, Pylons and Zope 3 all get some coverage this year, with 
Jython also being shown as an option for Web application development and 
deployment.

So, for anyone reading this in Europe (or with European travel plans next 
month), why not plan a trip to Vilnius, Lithuania if you haven't already done 
so? More details can be found on the EuroPython site:

  http://www.europython.org/

Talks will take place from Monday 7th July until Wednesday 9th July with 
sprints taking place afterwards until Saturday 12th July.

I look forward to seeing some of you there!

Paul

From orsenthil at gmail.com  Mon Jun 16 05:23:41 2008
From: orsenthil at gmail.com (O.R.Senthil Kumaran)
Date: Mon, 16 Jun 2008 08:53:41 +0530
Subject: [Web-SIG] urllib package addressing PEP 3108
Message-ID: <20080616032340.GA16198@gmail.com>

Hello All,

According to PEP3108, the new urllib package will consists of request.py
(urllib2.py and url handling functions from urllib (URLOpener, FancyURLOpener)
and then parse.py ( urlparse.py and parsing related methods from urllib).
http://bugs.python.org/issue2885 tracks the package creation.

Current urllib.py exposes the following methods.

__all__ = ["urlopen", "URLopener", "FancyURLopener", "urlretrieve",
           "urlcleanup", "quote", "quote_plus", "unquote", "unquote_plus",
           "urlencode", "url2pathname", "pathname2url", "splittag",
           "localhost", "thishost", "ftperrors", "basejoin", "unwrap",
           "splittype", "splithost", "splituser", "splitpasswd", "splitport",
           "splitnport", "splitquery", "splitattr", "splitvalue",
           "getproxies"]

Now the task is to divide them into request.py and parse.py.

1) urlopen method. Both urllib.py and urllib2.py currently have this method,
urllib one takes proxies as the last argument and urllib2 takes timeout as the
last argument.
How do we have both of them?

My thought, have urllib2's urlopen, because it anyway provides the proxy
handling through handlers and discard urllib's urlopen method.

Comments please?

Now, splitting the methods to request.py and parse.py

request.py - urlopen (urllib2's), URLopener, FancyURLopener, urlretrieve,
             urlcleanup, localhost, thishost, ftperrors, getproxies.

parse.py -   quote, quote_plus, unquote, unquote_plus, urlencode, url2pathname,
	     pathname2url, splittag,basejoin,unwrap,splittype,splithost,
	     splituser, splitpasswd, splitport,splitnport,splitquery,splitattr, 	     splitvalue


This to me looks like a major split up of the module and will involve code
changes across the two a lot.

When deciding upon the PEP3108 for urllib package, was this the thought
process? 

Is my split up theoretically correct? Do you have any suggestions?

Thanks,
Senthil


-- 
O.R.Senthil Kumaran
http://uthcode.sarovar.org

From orsenthil at gmail.com  Wed Jun 18 20:03:32 2008
From: orsenthil at gmail.com (O.R.Senthil Kumaran)
Date: Wed, 18 Jun 2008 23:33:32 +0530
Subject: [Web-SIG] urllib package addressing PEP 3108
In-Reply-To: <e04bdf310806181052l488a6813he255409be16d165b@mail.gmail.com>
References: <20080616032340.GA16198@gmail.com>
	<e04bdf310806181052l488a6813he255409be16d165b@mail.gmail.com>
Message-ID: <20080618180331.GA3693@gmail.com>

Hi Facundo,

* Facundo Batista <facundobatista at gmail.com> [2008-06-18 14:52:46]:
> 
> I think Jeremy will handle this today...


I got in touch with Jeremy and we both are working together. :-)
Currently there are 4 urllib tests still failing. Trying to sort things out.
We discussed upon the split-up and other details like single urlopen method.

Things are working out good. 

Thanks,
Senthil


> > 1) urlopen method. Both urllib.py and urllib2.py currently have this method,
> > urllib one takes proxies as the last argument and urllib2 takes timeout as the
> > last argument.
> > How do we have both of them?
> >
> > My thought, have urllib2's urlopen, because it anyway provides the proxy
> > handling through handlers and discard urllib's urlopen method.
> >
> > Comments please?
> 
> Which would be the drawback of accepting the proxies directly in the
> urlopen() function?
> 
> Right now, to use a proxy I do:
> 
> proxy = urllib2.ProxyHandler({"http":"http://www.norealproxy.com:8080"})
> opener = urllib2.build_opener(proxy, urllib2.HTTPHandler)
> urllib2.install_opener(opener)
> def ericsson_urlopen(*args):
>     return urllib2.urlopen(*args)
> 
> Maybe I could use the syntax of urllib.urlopen(), and that it
> automatically to do that?

We settled upon using urllib2's urlopen method. The difference between
urllib's urlopen and urllib2's urlopen was, both returned add_info_url()
objects with urlopen wrapping up http client response class and urllib2
wrapping it up in io.Buffered.Reader

So, the settlement was: use urlopen from urllib2, but wrap it in http client
class for the file like object so that things get handled for both.

The bugs in test were mostly due to this and is being fixed.

Thanks,
Senthil


> > Now, splitting the methods to request.py and parse.py
> >
> > request.py - urlopen (urllib2's), URLopener, FancyURLopener, urlretrieve,
> >             urlcleanup, localhost, thishost, ftperrors, getproxies.
> >
> > parse.py -   quote, quote_plus, unquote, unquote_plus, urlencode, url2pathname,
> >             pathname2url, splittag,basejoin,unwrap,splittype,splithost,
> >             splituser, splitpasswd, splitport,splitnport,splitquery,splitattr,              splitvalue
> 
> +1
> 
> Regards,
> 
> -- 
> . Facundo
> 
> Blog: http://www.taniquetil.com.ar/plog/
> PyAr: http://www.python.org/ar/

-- 
O.R.Senthil Kumaran
http://uthcode.sarovar.org

From orsenthil at gmail.com  Fri Jun 27 20:31:58 2008
From: orsenthil at gmail.com (O.R.Senthil Kumaran)
Date: Sat, 28 Jun 2008 00:01:58 +0530
Subject: [Web-SIG] urlparse method behaviour when handing abs/rel urls
Message-ID: <20080627183158.GA4644@gmail.com>

At http://bugs.python.org/issue754016, there is a discussion wherein if a URL
is given in a normal way to urlparse (For e.g. urlparse('www.python.org')), it
parses it as a path rather than as the net_loc component as is the comman case
with browsers.

urlparse module tries to follow RFC 1808, where it is specified that:

<quote_rfc1808>
2.4.3.  Parsing the Network Location/Login

   If the parse string begins with a double-slash "//", then the
   substring of characters after the double-slash and up to, but not
   including, the next slash "/" character is the network location/login
   (<net_loc>) of the URL.  

</quote_rfc1808>

For treating the url as a path, the RFC specifies that after parsing, scheme,
net_loc, parameters and query, whatever is left is path.

<quote_rfc1808>
2.4.6.  Parsing the Path

   After the above steps, all that is left of the parse string is the
   URL <path> and the slash "/" that may precede it. 
</quote_rfc1808>

So, when 'www.python.org' is not a scheme, net_loc (as per RFC), parameter or
query, it is a path. This case looks absurd for 'www.python.org' but perfect
for parsing relative urls like just 'a'. More over this makes sense when we
have relative urls with parameters and query, for e.g.'g:h','?x'

Now, the question comes as "How do we inform the users that if they want the
net_loc of the url, they have to use // in the front".

My suggestion is through the "Docs" and "Help" message.

There is a discussion and suggestion on raising an Exception for cases when url
does not start with '//'. 

As urlparse module is used for handling both absolute URLs as well as relative
URLS, this suggestion IMHO, would break the urlparse handling of all relative
urls. For e.g, Cases which are mentioned in the RFC 1808 (Section 5.1 Normal
Examples).

Another way to resolve this would be to break urlparse into two methods:
urlparse.absparse()
urlparse.relparse() 
and let the user decide what he wants.

Please provide your suggestions on this.
- Is the current method okay?
- Do we feel need for absparse and relparse()?


Thanks.
Senthil
-- 
O.R.Senthil Kumaran
http://uthcode.sarovar.org

From ianb at colorstudy.com  Fri Jun 27 20:35:54 2008
From: ianb at colorstudy.com (Ian Bicking)
Date: Fri, 27 Jun 2008 13:35:54 -0500
Subject: [Web-SIG] urlparse method behaviour when handing abs/rel urls
In-Reply-To: <20080627183158.GA4644@gmail.com>
References: <20080627183158.GA4644@gmail.com>
Message-ID: <4865330A.3090206@colorstudy.com>

O.R.Senthil Kumaran wrote:
> At http://bugs.python.org/issue754016, there is a discussion wherein if a URL
> is given in a normal way to urlparse (For e.g. urlparse('www.python.org')), it
> parses it as a path rather than as the net_loc component as is the comman case
> with browsers.

Browsers interpret it as a path, e.g., <a 
href="www.python.org">python.org</a> will not take you to www.python.org

There are things like email clients that detect domain names and turn 
them into links, but detecting links in text is quite different from 
anything urlparse does.

-- 
Ian Bicking : ianb at colorstudy.com : http://blog.ianbicking.org

From orsenthil at gmail.com  Fri Jun 27 21:01:08 2008
From: orsenthil at gmail.com (O.R.Senthil Kumaran)
Date: Sat, 28 Jun 2008 00:31:08 +0530
Subject: [Web-SIG] urlparse method behaviour when handing abs/rel urls
In-Reply-To: <4865330A.3090206@colorstudy.com>
References: <20080627183158.GA4644@gmail.com> <4865330A.3090206@colorstudy.com>
Message-ID: <20080627190108.GA4780@gmail.com>

* scriptor Ian Bicking explico:

> > At http://bugs.python.org/issue754016, there is a discussion wherein if a 
> > URL
> > is given in a normal way to urlparse (For e.g. urlparse('www.python.org')), 
> > it
> > parses it as a path rather than as the net_loc component as is the comman 
> > case
> > with browsers.
> 
>  Browsers interpret it as a path, e.g., <a 
>  href="www.python.org">python.org</a> will not take you to www.python.org
> 

Yes, you are right. In that case, what urlparse is currently doing is same as
what browser does. :) Surprise and I had forgot this! :)

BTW, commonly when someone writes 'www.python.org', we tend to understand that
he is referring to net_loc. Is it not?
And also, when we type 'www.python.org' at Address Location in the
Browser, it automatically gets translated to http://www.python.org as the full
url and www.python.org becomes net_loc in this case. 

Should we consider this scenario?

Thanks,
Senthil

-- 
O.R.Senthil Kumaran
http://uthcode.sarovar.org

From fdrake at gmail.com  Fri Jun 27 21:16:38 2008
From: fdrake at gmail.com (Fred Drake)
Date: Fri, 27 Jun 2008 15:16:38 -0400
Subject: [Web-SIG] urlparse method behaviour when handing abs/rel urls
In-Reply-To: <20080627190108.GA4780@gmail.com>
References: <20080627183158.GA4644@gmail.com> <4865330A.3090206@colorstudy.com>
	<20080627190108.GA4780@gmail.com>
Message-ID: <9cee7ab80806271216m6a89a4e4u8eaa20d4a6a0519f@mail.gmail.com>

On Fri, Jun 27, 2008 at 3:01 PM, O.R.Senthil Kumaran
<orsenthil at gmail.com> wrote:
> BTW, commonly when someone writes 'www.python.org', we tend to understand that
> he is referring to net_loc. Is it not?
> And also, when we type 'www.python.org' at Address Location in the
> Browser, it automatically gets translated to http://www.python.org as the full
> url and www.python.org becomes net_loc in this case.

There are two cases here:

1. Relative URLs in a context that has a base URL (inside a resource
loaded from a URL, or in an (X)HTML document that includes a <base>
element).

2. Abreviated URLs in a user interface that implies no context with a
base URL (like the browser's address bar).

I'd suggest that these are completely different.  urlsplit and
urlparse support 1.  If we want the second, that should be a separate
function.  It would be reasonable to add that to the urlparse module
(urllib.parse in Python 3).


 -Fred

-- 
Fred L. Drake, Jr. <fdrake at gmail.com>
"Chaos is the score upon which reality is written." --Henry Miller

From fumanchu at aminus.org  Fri Jun 27 21:36:33 2008
From: fumanchu at aminus.org (Robert Brewer)
Date: Fri, 27 Jun 2008 12:36:33 -0700
Subject: [Web-SIG] urlparse method behaviour when handing abs/rel urls
In-Reply-To: <9cee7ab80806271216m6a89a4e4u8eaa20d4a6a0519f@mail.gmail.com>
References: <20080627183158.GA4644@gmail.com>
	<4865330A.3090206@colorstudy.com><20080627190108.GA4780@gmail.com>
	<9cee7ab80806271216m6a89a4e4u8eaa20d4a6a0519f@mail.gmail.com>
Message-ID: <F1962646D3B64642B7C9A06068EE1E6403CDCCBA@ex10.hostedexchange.local>

Fred Drake wrote:
> On Fri, Jun 27, 2008 at 3:01 PM, O.R.Senthil Kumaran
> <orsenthil at gmail.com> wrote:
> > BTW, commonly when someone writes 'www.python.org', we tend to
> > understand that he is referring to net_loc. Is it not?
> > And also, when we type 'www.python.org' at Address Location in the
> > Browser, it automatically gets translated to http://www.python.org
as
> > the full url and www.python.org becomes net_loc in this case.
> 
> There are two cases here:
> 
> 1. Relative URLs in a context that has a base URL (inside a resource
> loaded from a URL, or in an (X)HTML document that includes a <base>
> element).
> 
> 2. Abreviated URLs in a user interface that implies no context with a
> base URL (like the browser's address bar).
> 
> I'd suggest that these are completely different.  urlsplit and
> urlparse support 1.  If we want the second, that should be a separate
> function.  It would be reasonable to add that to the urlparse module
> (urllib.parse in Python 3).

There's even a 3rd case: HTTP's Request-URI. For example, '//path' must
be treated as an abs_path consisting of two path_segments ['', 'path'],
not a net_loc, since the Request_URI must be one of ("*" | absoluteURI |
abs_path | authority).


Robert Brewer
fumanchu at aminus.org

See
http://www.cherrypy.org/browser/branches/815-urljoin/cherrypy/wsgiserver
/__init__.py#L247 for an implementation.

From orsenthil at gmail.com  Sun Jun 29 13:32:53 2008
From: orsenthil at gmail.com (O.R.Senthil Kumaran)
Date: Sun, 29 Jun 2008 17:02:53 +0530
Subject: [Web-SIG] urlparse method behaviour when handing abs/rel urls
In-Reply-To: <9cee7ab80806271216m6a89a4e4u8eaa20d4a6a0519f@mail.gmail.com>
References: <20080627183158.GA4644@gmail.com> <4865330A.3090206@colorstudy.com>
	<20080627190108.GA4780@gmail.com>
	<9cee7ab80806271216m6a89a4e4u8eaa20d4a6a0519f@mail.gmail.com>
Message-ID: <20080629113253.GA3291@gmail.com>

* scriptor Fred Drake, explico 
> 2. Abreviated URLs in a user interface that implies no context with a
> base URL (like the browser's address bar).
> 
> I'd suggest that these are completely different.  urlsplit and
> urlparse support 1.  If we want the second, that should be a separate
> function.  It would be reasonable to add that to the urlparse module
> (urllib.parse in Python 3).
> 

Thanks for the clarification. That sums up the things.

I seek a concensus on a need for a "Abreviated URL" handling function. Do we
need this in urllib.parse/urlparse library?

In that case the specifications of how this function should behave will need to
be defined by us.

One advantage I can see is, when people provide "abbreviated url", then the
result of parsing it into path and netloc would be proper as per their (common
held) expectations.

Anything else?


-- 
O.R.Senthil Kumaran
http://uthcode.sarovar.org

From orsenthil at gmail.com  Sun Jun 29 13:43:03 2008
From: orsenthil at gmail.com (O.R.Senthil Kumaran)
Date: Sun, 29 Jun 2008 17:13:03 +0530
Subject: [Web-SIG] urlparse method behaviour when handing abs/rel urls
In-Reply-To: <F1962646D3B64642B7C9A06068EE1E6403CDCCBA@ex10.hostedexchange.local>
References: <20080627183158.GA4644@gmail.com>
	<9cee7ab80806271216m6a89a4e4u8eaa20d4a6a0519f@mail.gmail.com>
	<F1962646D3B64642B7C9A06068EE1E6403CDCCBA@ex10.hostedexchange.local>
Message-ID: <20080629114303.GA14128@gmail.com>

* scriptor Robert Brewer, explico 
> 
> There's even a 3rd case: HTTP's Request-URI. For example, '//path' must
> be treated as an abs_path consisting of two path_segments ['', 'path'],
> not a net_loc, since the Request_URI must be one of ("*" | absoluteURI |
> abs_path | authority).
> 
> See
> http://www.cherrypy.org/browser/branches/815-urljoin/cherrypy/wsgiserver
> /__init__.py#L247 for an implementation.

Thanks for passing on this note and the example. 
Gives an idea of changes required in urlparse modules for RFC2396 compliance 

-- 
O.R.Senthil Kumaran
http://uthcode.sarovar.org

From facundobatista at gmail.com  Wed Jun 18 19:52:55 2008
From: facundobatista at gmail.com (Facundo Batista)
Date: Wed, 18 Jun 2008 17:52:55 -0000
Subject: [Web-SIG] urllib package addressing PEP 3108
In-Reply-To: <20080616032340.GA16198@gmail.com>
References: <20080616032340.GA16198@gmail.com>
Message-ID: <e04bdf310806181052l488a6813he255409be16d165b@mail.gmail.com>

2008/6/16 O.R.Senthil Kumaran <orsenthil at gmail.com>:

> (urllib2.py and url handling functions from urllib (URLOpener, FancyURLOpener)
> and then parse.py ( urlparse.py and parsing related methods from urllib).
> http://bugs.python.org/issue2885 tracks the package creation.

I think Jeremy will handle this today...

O.R., did you make some of this work? Can you help Jeremy somehow?


> 1) urlopen method. Both urllib.py and urllib2.py currently have this method,
> urllib one takes proxies as the last argument and urllib2 takes timeout as the
> last argument.
> How do we have both of them?
>
> My thought, have urllib2's urlopen, because it anyway provides the proxy
> handling through handlers and discard urllib's urlopen method.
>
> Comments please?

Which would be the drawback of accepting the proxies directly in the
urlopen() function?

Right now, to use a proxy I do:

proxy = urllib2.ProxyHandler({"http":"http://www.norealproxy.com:8080"})
opener = urllib2.build_opener(proxy, urllib2.HTTPHandler)
urllib2.install_opener(opener)
def ericsson_urlopen(*args):
    return urllib2.urlopen(*args)

Maybe I could use the syntax of urllib.urlopen(), and that it
automatically to do that?


> Now, splitting the methods to request.py and parse.py
>
> request.py - urlopen (urllib2's), URLopener, FancyURLopener, urlretrieve,
>             urlcleanup, localhost, thishost, ftperrors, getproxies.
>
> parse.py -   quote, quote_plus, unquote, unquote_plus, urlencode, url2pathname,
>             pathname2url, splittag,basejoin,unwrap,splittype,splithost,
>             splituser, splitpasswd, splitport,splitnport,splitquery,splitattr,              splitvalue

+1

Regards,

-- 
. Facundo

Blog: http://www.taniquetil.com.ar/plog/
PyAr: http://www.python.org/ar/