Check URL --> Simply?

Garth Grimm garth_grimm at
Wed Aug 15 20:00:32 CEST 2001

That's about the best solution possible.

Unfortunately, there are some dynamic sites that actually trap what should
be a 404, and instead return a valid 200 and a web page document.
Broadvision being one notable culprit, which often is configured so that a
request to a page that doesn't exist is responded to with the home page of
the web site.

As far as a human user is concerned, this is a dead link, but it's next to
impossible to detect in programming code.  You end up having to make
customized solutions in this case.


"Andy McKay" <andym at> wrote in message
news:mailman.997892665.9674.python-list at
> Just check the headers, this means you dont get the document, saving
> bandwith and solving the 404 file problem (assuming the set the right
> headers)
> from httplib import HTTP
> from urlparse import urlparse
> def checkURL(url):
>      p = urlparse(url)
>      h = HTTP(p[1])
>      h.putrequest('HEAD', p[2])
>      h.endheaders()
>      if h.getreply()[0] == 200: return 1
>      else: return 0
> if __name__ == '__main__':
>      assert checkURL('')
>      assert not checkURL('')
> --
>    Andy McKay

More information about the Python-list mailing list