[ python-Bugs-947571 ] urllib.urlopen() fails to raise exception
SourceForge.net
noreply at sourceforge.net
Sat Jul 10 20:25:24 CEST 2004
Bugs item #947571, was opened at 2004-05-04 02:57
Message generated for change (Comment added) made by mike_j_brown
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=947571&group_id=5470
Category: Python Library
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: M.-A. Lemburg (lemburg)
Assigned to: Nobody/Anonymous (nobody)
Summary: urllib.urlopen() fails to raise exception
Initial Comment:
I've come across a strange problem: even though
the docs say that urllib.urlopen() should raise an IOError
for server errors (e.g. 404s), all versions of Python that
I've tested (1.5.2 - 2.3) fail to do so.
Example:
>>> import urllib
>>> f =
urllib.urlopen('http://www.example.net/this-url-does-not-exist/')
>>> print f.read()
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<HTML><HEAD>
<TITLE>404 Not Found</TITLE>
</HEAD><BODY>
<H1>Not Found</H1>
The requested URL /this-url-does-not-exist/ was not
found on this server.<P>
<HR>
<ADDRESS>Apache/1.3.27 Server at www.example.com Port
80</ADDRESS>
</BODY></HTML>
Either the docs are wrong or the implementation has a
really long standing bug or I am missing something.
----------------------------------------------------------------------
Comment By: Mike Brown (mike_j_brown)
Date: 2004-07-10 11:25
Message:
Logged In: YES
user_id=371366
In urllib.FancyURLopener, which is the class used by
urllib.urlopen(), there is this override of URLopener's
http_error_default:
def http_error_default(self, url, fp, errcode, errmsg,
headers):
"""Default error handling -- don't raise an exception."""
return addinfourl(fp, headers, "http:" + url)
I don't see how this is really all that desirable, but
nevertheless it appears to be quite deliberate.
It looks like the intent in urlopen is that if you want to use
some other opener besides an instance of FancyURLopener,
you can set urllib._urlopener. This seems to work:
>>> import urllib
>>> class MyUrlOpener(urllib.FancyURLopener):
... def http_error_default(*args, **kwargs):
... return urllib.URLopener.http_error_default(*args,
**kwargs)
...
>>> urllib._urlopener = MyUrlOpener()
>>> urllib.urlopen('http://www.example.com/this-url-does-
not-exist/')
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "/usr/local/lib/python2.3/urllib.py", line 76, in urlopen
return opener.open(url)
File "/usr/local/lib/python2.3/urllib.py", line 181, in open
return getattr(self, name)(url)
File "/usr/local/lib/python2.3/urllib.py", line 306, in open_http
return self.http_error(url, fp, errcode, errmsg, headers)
File "/usr/local/lib/python2.3/urllib.py", line 323, in http_error
return self.http_error_default(url, fp, errcode, errmsg,
headers)
File "<stdin>", line 3, in http_error_default
File "/usr/local/lib/python2.3/urllib.py", line 329, in
http_error_default
raise IOError, ('http error', errcode, errmsg, headers)
IOError: ('http error', 404, 'Not Found', <httplib.HTTPMessage
instance at 0x836298c>)
----------------------------------------------------------------------
Comment By: Walter Dörwald (doerwalter)
Date: 2004-06-02 11:29
Message:
Logged In: YES
user_id=89016
This seems to work with urllib2:
>>> import urllib2
>>> f = urllib2.urlopen('http://www.example.net/this-url-does-
not-exist/')
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "/usr/local/lib/python2.3/urllib2.py", line 129, in urlopen
return _opener.open(url, data)
File "/usr/local/lib/python2.3/urllib2.py", line 326, in open
'_open', req)
File "/usr/local/lib/python2.3/urllib2.py", line 306, in
_call_chain
result = func(*args)
File "/usr/local/lib/python2.3/urllib2.py", line 901, in
http_open
return self.do_open(httplib.HTTP, req)
File "/usr/local/lib/python2.3/urllib2.py", line 895, in do_open
return self.parent.error('http', req, fp, code, msg, hdrs)
File "/usr/local/lib/python2.3/urllib2.py", line 352, in error
return self._call_chain(*args)
File "/usr/local/lib/python2.3/urllib2.py", line 306, in
_call_chain
result = func(*args)
File "/usr/local/lib/python2.3/urllib2.py", line 412, in
http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 404: Not Found
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=947571&group_id=5470
More information about the Python-bugs-list
mailing list