Check URL --> Simply?
Garth Grimm
garth_grimm at hp.com
Wed Aug 15 14:00:32 EDT 2001
That's about the best solution possible.
Unfortunately, there are some dynamic sites that actually trap what should
be a 404, and instead return a valid 200 and a web page document.
Broadvision being one notable culprit, which often is configured so that a
request to a page that doesn't exist is responded to with the home page of
the web site.
As far as a human user is concerned, this is a dead link, but it's next to
impossible to detect in programming code. You end up having to make
customized solutions in this case.
--
Garth
"Andy McKay" <andym at ActiveState.com> wrote in message
news:mailman.997892665.9674.python-list at python.org...
> Just check the headers, this means you dont get the document, saving
> bandwith and solving the 404 file problem (assuming the set the right
> headers)
>
> from httplib import HTTP
> from urlparse import urlparse
>
> def checkURL(url):
> p = urlparse(url)
> h = HTTP(p[1])
> h.putrequest('HEAD', p[2])
> h.endheaders()
> if h.getreply()[0] == 200: return 1
> else: return 0
>
> if __name__ == '__main__':
> assert checkURL('http://slashdot.org')
> assert not checkURL('http://slashdot.org/notadirectory')
>
> --
> Andy McKay
>
>
More information about the Python-list
mailing list