Check URL --> Simply?
garth_grimm at hp.com
Wed Aug 15 20:00:32 CEST 2001
That's about the best solution possible.
Unfortunately, there are some dynamic sites that actually trap what should
be a 404, and instead return a valid 200 and a web page document.
Broadvision being one notable culprit, which often is configured so that a
request to a page that doesn't exist is responded to with the home page of
the web site.
As far as a human user is concerned, this is a dead link, but it's next to
impossible to detect in programming code. You end up having to make
customized solutions in this case.
"Andy McKay" <andym at ActiveState.com> wrote in message
news:mailman.997892665.9674.python-list at python.org...
> Just check the headers, this means you dont get the document, saving
> bandwith and solving the 404 file problem (assuming the set the right
> from httplib import HTTP
> from urlparse import urlparse
> def checkURL(url):
> p = urlparse(url)
> h = HTTP(p)
> h.putrequest('HEAD', p)
> if h.getreply() == 200: return 1
> else: return 0
> if __name__ == '__main__':
> assert checkURL('http://slashdot.org')
> assert not checkURL('http://slashdot.org/notadirectory')
> Andy McKay
More information about the Python-list