Determine Whether File Exists On HTTP Server

Troy Melhase troy at gci.net
Sat May 22 04:54:57 EDT 2004


On Saturday 22 May 2004 12:28 am, OvErboRed wrote:
> Hi, I'm trying to determine whether a given URL exists. I'm new to Python
> but I think that urllib is the tool for the job. However, if I give it a
> non-existent file, it simply returns the 404 page. Aside from grepping this
> for '404', is there a better way to do this? (Preferrably, there is a
> solution that can be applied to both HTTP and FTP.) Thanks in advance.

Try urllib2.urlopen, and put a try/except block around it.  Here's what an 
unhandled exception from a 404 response looks like:

Python 2.3.3 (#1, May 14 2004, 09:49:22)
[GCC 3.3.2 20031218 (Gentoo Linux 3.3.2-r5, propolice-3.3-7)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import urllib2
>>> handle = urllib2.urlopen('http://google.com/this_page_doesnt_exist')
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "/usr/lib/python2.3/urllib2.py", line 129, in urlopen
    return _opener.open(url, data)
  File "/usr/lib/python2.3/urllib2.py", line 326, in open
    '_open', req)
  File "/usr/lib/python2.3/urllib2.py", line 306, in _call_chain
    result = func(*args)
  File "/usr/lib/python2.3/urllib2.py", line 901, in http_open
    return self.do_open(httplib.HTTP, req)
  File "/usr/lib/python2.3/urllib2.py", line 895, in do_open
    return self.parent.error('http', req, fp, code, msg, hdrs)
  File "/usr/lib/python2.3/urllib2.py", line 346, in error
    result = self._call_chain(*args)
  File "/usr/lib/python2.3/urllib2.py", line 306, in _call_chain
    result = func(*args)
  File "/usr/lib/python2.3/urllib2.py", line 472, in http_error_302
    return self.parent.open(new)
  File "/usr/lib/python2.3/urllib2.py", line 326, in open
    '_open', req)
  File "/usr/lib/python2.3/urllib2.py", line 306, in _call_chain
    result = func(*args)
  File "/usr/lib/python2.3/urllib2.py", line 901, in http_open
    return self.do_open(httplib.HTTP, req)
  File "/usr/lib/python2.3/urllib2.py", line 895, in do_open
    return self.parent.error('http', req, fp, code, msg, hdrs)
  File "/usr/lib/python2.3/urllib2.py", line 352, in error
    return self._call_chain(*args)
  File "/usr/lib/python2.3/urllib2.py", line 306, in _call_chain
    result = func(*args)
  File "/usr/lib/python2.3/urllib2.py", line 412, in http_error_default
    raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 404: Not Found

-- 
Troy Melhase, troy at gci.net
--
When Christ calls a man, he bids him come and die. - Dietrich Bonhoeffer





More information about the Python-list mailing list