[Python-bugs-list] [ python-Bugs-408085 ] urllib.py https redirect-302 bug

noreply@sourceforge.net noreply@sourceforge.net
Sun, 25 Mar 2001 18:55:11 -0800


Bugs item #408085, was updated on 2001-03-12 17:05
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=408085&group_id=5470

Category: Python Library
Group: None
Status: Open
Priority: 5
Submitted By: Dustin Boswell (boswell)
Assigned to: Moshe Zadka (moshez)
Summary: urllib.py https redirect-302 bug

Initial Comment:
Using urllib.urlopen("https://...") seems 
to hang because of a redirect problem. Looks 
like its trying to follow the redirect with 
http not https. 

>>> import urllib 
>>> params = ... 
>>> f = urllib.urlopen("https://...", params) 
connect: (securesite.com, 80) 
#a printout from httplib, line 354 

Traceback (most recent call last): 
File "<stdin>", line 1, in ? 
File "/usr/local/lib/python2.0/urllib.py", line 63, in
urlopen 
return _urlopener.open(url, data) 
File "/usr/local/lib/python2.0/urllib.py", line 168, in
open 
return getattr(self, name)(url, data) 
File "/usr/local/lib/python2.0/urllib.py", line 367, in
open_https 
data) 
File "/usr/local/lib/python2.0/urllib.py", line 301, in
http_error 
result = method(url, fp, errcode, errmsg, headers,
data) 
File "/usr/local/lib/python2.0/urllib.py", line 537, in
http_error_302 
return self.open(newurl, data) 
File "/usr/local/lib/python2.0/urllib.py", line 168, in
open 
return getattr(self, name)(url, data) 
File "/usr/local/lib/python2.0/urllib.py", line 269, in
open_http 
h.putrequest('POST', selector) 
File "/usr/local/lib/python2.0/httplib.py", line 428,
in putrequest 
self.send(str) 
File "/usr/local/lib/python2.0/httplib.py", line 370,
in send 
self.connect() 
File "/usr/local/lib/python2.0/httplib.py", line 354,
in connect 
self.sock.connect((self.host, self.port)) 
KeyboardInterrupt 
>>> 

----------------------------------------------------------------------

Comment By: Nobody/Anonymous (nobody)
Date: 2001-03-25 18:55

Message:
Logged In: NO 

the location header must be an absolute uri
(rfc2616 section 14.30 and rfc1945 10.11).

----------------------------------------------------------------------

Comment By: Dustin Boswell (boswell)
Date: 2001-03-19 05:12

Message:
Logged In: YES 
user_id=153527

The server is https://trading.etrade.com

Unless you have an account there to try it yourself,
there's not much else specific information I can give you.

I know for sure that the redirection is to another
https url.  The "Location" header is actually a relative
one, which is where the bug in urllib.py is.  The problem
is that when open_https is called, if an error is
encountered, it calls http_error, which assumes the
url was an http, and so when a relative url is encountered,
just prepends a http:// to the front.  I can't think
of an elegant fix to this.  Maybe when http_error realizes
it's a relative location, it should prepend "proto" (some
argument to the function that doesn't exist yet) and
prepend THAT one to it...

def open_https(self, url, data=None):
  if errcode == 200:
     return addinfourl(fp, headers, url)
  else:
     if data is None:
        return self.http_error(url, fp, errcode, errmsg,
headers)
     else:
        return self.http_error(url, fp, errcode, errmsg,
headers, data)

... and here's the function called after the error is
realized...

  def http_error_302(self, url, fp, errcode, errmsg,
headers, data=None):
        """Error 302 -- relocated (temporarily)."""
        ######Here's the problem#############
        # In case the server sent a relative URL, join with
original:
        newurl = basejoin("http:" + url, newurl)
	#uh, what if it isn't http? we seem to have lost that
information...
        if data is None:
            return self.open(newurl)
        else:
            return self.open(newurl, data)

I originally was developing my project in JAVA and
had it working, but was realizing that I was re-inventing
the wheel (i.e. redirection handling). So I switched to
Python (for other reasons too).  But I went back and
placed a POST instead of GET in the redirection handling
and everything still worked, so as for the possible GET vs.
POST redirect server bug, it wasn't that (although that's
very interesting to know...).

Am I making any sense?

----------------------------------------------------------------------

Comment By: Moshe Zadka (moshez)
Date: 2001-03-18 01:13

Message:
Logged In: YES 
user_id=11645

Errr....I'm not sure I see the bug. Perhaps the "Location"
header actually contained an "http://" URL? If you can give
me the site or more information (like a printout of newurl),
I can probably be of more help.

In testing (sadly, against a server inside a firewall, so I
cannot give the URL) I have found that it seems to work.

One thing, that may or may not have to do with your problem:
when POSTing, a 302 means "POST to that other URL", not
"GET that other URL". Many webserver writers seem to ignore
this, and many browsers compensate for that server bug.
urllib2 does *not* compensate for that bug -- I haven't
thought through whether *that* may be the explanation.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=408085&group_id=5470