blocking forever with urllib
Michael P. Soulier
msoulier at storm.ca
Thu Aug 30 18:53:39 EDT 2001
Hey people.
I'm writing a web crawler as an exercise, using urllib and htmllib to
recursively crawl through the pages. Whenever urllib.urlopen() throws an
IOError exception the url gets flagged as a broken link.
Unfortunately, urllib.urlopen() is blocking for some time on one URL. When
I do an nslookup on it, it times out within a few seconds, since it's a URL
from our internal intranet at work and is not accessible from the internet.
However, urllib.urlopen() takes forever to return.
Is there a way to specify a timeout for this library? I can't find a way
in the documentation.
Thanks,
Mike
--
Michael P. Soulier <michael.soulier at home.com>
"Pretty soon, massive bloat is the industry standard and everyone is using
huge, buggy programs not even their developers can love."
-Eric S. Raymond, The Art of Unix Programming
More information about the Python-list
mailing list