blocking forever with urllib

Michael P. Soulier msoulier at storm.ca
Thu Aug 30 18:53:39 EDT 2001


    Hey people. 

    I'm writing a web crawler as an exercise, using urllib and htmllib to
recursively crawl through the pages. Whenever urllib.urlopen() throws an
IOError exception the url gets flagged as a broken link. 

    Unfortunately, urllib.urlopen() is blocking for some time on one URL. When
I do an nslookup on it, it times out within a few seconds, since it's a URL
from our internal intranet at work and is not accessible from the internet.
However, urllib.urlopen() takes forever to return. 

    Is there a way to specify a timeout for this library? I can't find a way
in the documentation. 

    Thanks,

    Mike

-- 
Michael P. Soulier <michael.soulier at home.com>
"Pretty soon, massive bloat is the industry standard and everyone is using
huge, buggy programs not even their developers can love."
    -Eric S. Raymond, The Art of Unix Programming



More information about the Python-list mailing list