[Web-SIG] python bug issue2464

Sidnei da Silva sidnei at enfoldsystems.com
Wed Aug 13 17:25:43 CEST 2008


I also noticed that there's a Set-Cookie header in there. If you're
not handling cookies that could potentially cause some trouble too,
though I suspect this is not the problem here.

On Wed, Aug 13, 2008 at 11:17 AM, Jean-Paul Calderone
<exarkun at divmod.com> wrote:
> On Wed, 13 Aug 2008 18:14:19 +0530, "O.R.Senthil Kumaran"
> <orsenthil at gmail.com> wrote:
>>
>> I am trying to write a fix for this bug http://bugs.python.org/issue2464
>> - urllib2 can't handle http://www.wikispaces.com
>>
>> What actually happening here is:
>>
>> 1) urllib2 tries to open http://www.wikispaces.com
>> 2) It gets 302 Redirected to
>>
>> https://session.wikispaces.com/session/auth?authToken=1bd8784307f89a495cc1aafb075c4983
>> 3) It again gets 302 Redirected to:
>> 'http://www.wikispaces.com?responseToken=1bd8784307f89a495cc1aafb075c4983
>>
>> After this, gets a 200 code, but when the page it retrived it 400 Bad
>> Request!
>>
>> Firefox has NO problem in getting the actual page though.
>>
>> Here is the O/P of the session (I have made print header.items() at
>> http_error_302 method in HTTPRedirectHandler):
>>
>>>>> obj1 = urllib2.urlopen("http://www.wikispaces.com")
>>
>> [('content-length', '0'), ('x-whom', 'w9-prod-http, p1'), ('set-cookie',
>> 'slave=1; expires=Thu, 01-Jan-1970 00:00:01 GMT; path=/, test=1;
>> expires=Wed,
>> 13-Aug-2008 13:03:51 GMT; path=/'), ('server', 'nginx/0.6.30'),
>> ('connection',
>> 'close'), ('location',
>>
>> 'https://session.wikispaces.com/session/auth?authToken=4b3eecb5c1ab301689e446cf03b3a585'),
>> ('date', 'Wed, 13 Aug 2008 12:33:51 GMT'), ('p3p', 'CP: ALL DSP COR CURa
>> ADMa
>> DEVa CONo OUR IND ONL COM NAV INT CNT STA'), ('content-type', 'text/html;
>> charset=utf-8')]
>> [('content-length', '0'), ('x-whom', 'w8-prod-https, p1'), ('set-cookie',
>> 'master=1; expires=Thu, 01-Jan-1970 00:00:01 GMT; path=/,
>> master=7de5d46e15fd23b1ddf782c565d4fb3a; expires=Thu, 14-Aug-2008 13:03:53
>> GMT;
>> path=/; domain=session.wikispaces.com'), ('server', 'nginx/0.6.30'),
>> ('connection', 'close'), ('location',
>>
>> 'http://www.wikispaces.com?responseToken=4b3eecb5c1ab301689e446cf03b3a585'),
>> ('date', 'Wed, 13 Aug 2008 12:33:53 GMT'), ('p3p', 'CP: ALL DSP COR CURa
>> ADMa
>> DEVa CONo OUR IND ONL COM NAV INT CNT STA'), ('content-type', 'text/html;
>> charset=utf-8')]
>>>>>
>>>>> print obj1.geturl()
>>
>> http://www.wikispaces.com?responseToken=4b3eecb5c1ab301689e446cf03b3a585
>>>>>
>>>>> print obj1.code
>>
>> 200
>>>>>
>>>>> print obj1.headers
>>
>>>>> print obj1.info()
>>
>>>>> print obj1.read()
>>
>> <html>
>> <head><title>400 Bad Request</title></head>
>> <body bgcolor="white">
>> <center><h1>400 Bad Request</h1></center>
>> <hr><center>nginx/0.6.30</center>
>> </body>
>> </html>
>>
>> With all this happening with urllib2, firefox is able to handle this
>> properly.
>> Also I notice that I suffix the url with a dummy path say
>> url = "http://www.wikispaces.com/dummy_url_path". The urlopen request will
>> still to through 302-302-200. but with dummy_url_path appended in the
>> redirections and then read() will succeed!
>>
>> Please share your opinion on where do you think, that urllib2 is going
>> wrong
>> here! I am not able to drill down to the fault point.
>> This has NOT got to do with null characters in the redirection url as
>> noted in
>> the bug report.
>>
>
> Some things:
>
>  http://foo.com
>
> This is not a valid URL.  The correct URL for the intended location here
> is:
>
>  http://foo.com/
>
> This is the root of the problem, I suspect.  Firefox notices this problem
> and fixes it when deciding what requests to make.  For example, while
> urllib2 ultimately asks for this URL:
>
>  ?responseToken=f02a955460b2cc180e9bf1faa8efd383
>
> Firefox recognizes that this is silly and instead asks for:
>
>  /?responseToken=5007a08643c2b4dd719a8848024b2c7a
>
> The tokens are different because these are values from actual requests.
> Notice the important difference, though - Firefox's request begins with
> a /.
>
> Likely, urllib2 should do a bit more validation of its input and make
> sure it is only making requests which follow the protocol.
>
> Jean-Paul
> _______________________________________________
> Web-SIG mailing list
> Web-SIG at python.org
> Web SIG: http://www.python.org/sigs/web-sig
> Unsubscribe:
> http://mail.python.org/mailman/options/web-sig/sidnei%40enfoldsystems.com
>



-- 
Sidnei da Silva
Enfold Systems http://enfoldsystems.com
Fax +1 832 201 8856 Office +1 713 942 2377 Ext 214


More information about the Web-SIG mailing list