[Python-bugs-list] [ python-Bugs-563665 ] urllib2 can't cope with error response

Sat, 06 Jul 2002 11:49:51 -0700

Bugs item #563665, was opened at 2002-06-02 22:28
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=563665&group_id=5470

Category: None
Group: None
>Status: Closed
>Resolution: Fixed
Priority: 5
Submitted By: Erik Demaine (edemaine)
Assigned to: Jeremy Hylton (jhylton)
Summary: urllib2 can't cope with error response

Initial Comment:
This looks similar to SF bug 216649, but with somewhat
different symptoms.  Redirection seems to cause an
AttributeError (attempt to access self.fp.read when
self.fp is None).  Simple example:

python -c "import urllib2; urllib2.urlopen
('http://www.yahoo.com/promotions/mom_com97/supermom.html')"

Traceback from Python 2.2.1 attached.  Same behavior
appears with Python 2.2.

----------------------------------------------------------------------

>Comment By: Jeremy Hylton (jhylton)
Date: 2002-07-06 18:49

Message:
Logged In: YES 
user_id=31392

httplib.py 1.55 now treats the page as an HTTP/0.9 response,
just like Mozilla.

----------------------------------------------------------------------

Comment By: Jeremy Hylton (jhylton)
Date: 2002-06-03 16:55

Message:
Logged In: YES 
user_id=31392

Fixed the urllib2 part of the problem in CVS as rev 1.31 of
urllib2.py.  You'll now get a better error message about
what went wrong.

Still not sure what httplib should do differently.  I notice
that Mozilla renders this page with the HTTP response in the
text, including junk at the very beginning of the response.
 (The server is clearly broken.)

It would probably be best if httplib treated this as an
HTTP/0.9 response if there appears to be a valid message
body.  It looks like that's what Mozilla is doing.

----------------------------------------------------------------------

Comment By: Jeremy Hylton (jhylton)
Date: 2002-06-03 16:17

Message:
Logged In: YES 
user_id=31392

I haven't looked at 216649 yet, but this particular
traceback is caused by a problem loading the redirected url.
 If you load 
http://promotions.yahoo.com/promotions/mom_com97/supermom.html,
you'll see the same failure without invoking an redirect
machinery.

My first guess is that the yahoo server is sending an
invalid response and the httplib isn't being generous enough
in skipping the garbage and looking for the valid response
data.  Here's a brief trace of httplib activity:
>>> import httplib
>>> h = httplib.HTTP('promotions.yahoo.com')
>>> h.set_debuglevel(2)
>>> h.putrequest("GET /promotions/mom_com97/supermom.html")   
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
TypeError: putrequest() takes at least 3 arguments (2 given)
>>> h.putrequest("GET", "/promotions/mom_com97/supermom.html")
connect: (promotions.yahoo.com, 80)
send: 'GET /promotions/mom_com97/supermom.html HTTP/1.0\r\n'
>>> h.endheaders()
send: '\r\n'
>>> h.getreply()
reply: '#\x0f\x01yhh00000011\x010\x01HTTP/1.0 200 OK\n'
(-1, '#\x0f\x01yhh00000011\x010\x01HTTP/1.0 200 OK\n', None)

Not sure what the text starting with a hash is all about.

Of course, urllib2 has a bug that prevents it from reporting
anything useful about this error.  That needs to be fixed.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=563665&group_id=5470