[ python-Bugs-947571 ] urllib.urlopen() fails to raise exception

SourceForge.net noreply at sourceforge.net
Sat Jul 10 20:25:24 CEST 2004


Bugs item #947571, was opened at 2004-05-04 02:57
Message generated for change (Comment added) made by mike_j_brown
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=947571&group_id=5470

Category: Python Library
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: M.-A. Lemburg (lemburg)
Assigned to: Nobody/Anonymous (nobody)
Summary: urllib.urlopen() fails to raise exception

Initial Comment:
I've come across a strange problem: even though
the docs say that urllib.urlopen() should raise an IOError
for server errors (e.g. 404s), all versions of Python that
I've tested (1.5.2 - 2.3) fail to do so.

Example:
>>> import urllib
>>> f =
urllib.urlopen('http://www.example.net/this-url-does-not-exist/')
>>> print f.read()
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<HTML><HEAD>
<TITLE>404 Not Found</TITLE>
</HEAD><BODY>
<H1>Not Found</H1>
The requested URL /this-url-does-not-exist/ was not
found on this server.<P>
<HR>
<ADDRESS>Apache/1.3.27 Server at www.example.com Port
80</ADDRESS>
</BODY></HTML>

Either the docs are wrong or the implementation has a
really long standing bug or I am missing something.

----------------------------------------------------------------------

Comment By: Mike Brown (mike_j_brown)
Date: 2004-07-10 11:25

Message:
Logged In: YES 
user_id=371366

In urllib.FancyURLopener, which is the class used by 
urllib.urlopen(), there is this override of URLopener's 
http_error_default:

    def http_error_default(self, url, fp, errcode, errmsg, 
headers):
        """Default error handling -- don't raise an exception."""
        return addinfourl(fp, headers, "http:" + url)

I don't see how this is really all that desirable, but 
nevertheless it appears to be quite deliberate.

It looks like the intent in urlopen is that if you want to use 
some other opener besides an instance of FancyURLopener, 
you can set urllib._urlopener. This seems to work:

>>> import urllib
>>> class MyUrlOpener(urllib.FancyURLopener):
...     def http_error_default(*args, **kwargs):
...         return urllib.URLopener.http_error_default(*args, 
**kwargs)
... 
>>> urllib._urlopener = MyUrlOpener()
>>> urllib.urlopen('http://www.example.com/this-url-does-
not-exist/')
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "/usr/local/lib/python2.3/urllib.py", line 76, in urlopen
    return opener.open(url)
  File "/usr/local/lib/python2.3/urllib.py", line 181, in open
    return getattr(self, name)(url)
  File "/usr/local/lib/python2.3/urllib.py", line 306, in open_http
    return self.http_error(url, fp, errcode, errmsg, headers)
  File "/usr/local/lib/python2.3/urllib.py", line 323, in http_error
    return self.http_error_default(url, fp, errcode, errmsg, 
headers)
  File "<stdin>", line 3, in http_error_default
  File "/usr/local/lib/python2.3/urllib.py", line 329, in 
http_error_default
    raise IOError, ('http error', errcode, errmsg, headers)
IOError: ('http error', 404, 'Not Found', <httplib.HTTPMessage 
instance at 0x836298c>)

----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2004-06-02 11:29

Message:
Logged In: YES 
user_id=89016

This seems to work with urllib2:
>>> import urllib2
>>> f = urllib2.urlopen('http://www.example.net/this-url-does-
not-exist/')
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "/usr/local/lib/python2.3/urllib2.py", line 129, in urlopen
    return _opener.open(url, data)
  File "/usr/local/lib/python2.3/urllib2.py", line 326, in open
    '_open', req)
  File "/usr/local/lib/python2.3/urllib2.py", line 306, in 
_call_chain
    result = func(*args)
  File "/usr/local/lib/python2.3/urllib2.py", line 901, in 
http_open
    return self.do_open(httplib.HTTP, req)
  File "/usr/local/lib/python2.3/urllib2.py", line 895, in do_open
    return self.parent.error('http', req, fp, code, msg, hdrs)
  File "/usr/local/lib/python2.3/urllib2.py", line 352, in error
    return self._call_chain(*args)
  File "/usr/local/lib/python2.3/urllib2.py", line 306, in 
_call_chain
    result = func(*args)
  File "/usr/local/lib/python2.3/urllib2.py", line 412, in 
http_error_default
    raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 404: Not Found

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=947571&group_id=5470


More information about the Python-bugs-list mailing list