blocking forever with urllib

Michael P. Soulier msoulier at
Fri Aug 31 00:53:39 CEST 2001

    Hey people. 

    I'm writing a web crawler as an exercise, using urllib and htmllib to
recursively crawl through the pages. Whenever urllib.urlopen() throws an
IOError exception the url gets flagged as a broken link. 

    Unfortunately, urllib.urlopen() is blocking for some time on one URL. When
I do an nslookup on it, it times out within a few seconds, since it's a URL
from our internal intranet at work and is not accessible from the internet.
However, urllib.urlopen() takes forever to return. 

    Is there a way to specify a timeout for this library? I can't find a way
in the documentation. 



Michael P. Soulier <michael.soulier at>
"Pretty soon, massive bloat is the industry standard and everyone is using
huge, buggy programs not even their developers can love."
    -Eric S. Raymond, The Art of Unix Programming

More information about the Python-list mailing list