blocking forever with urllib
Naris Siamwalla
naris at ensim.com
Fri Aug 31 00:56:35 EDT 2001
i ran into this problem when some web servers will accept
the connection but just wait and do nothing. to fix this we
added a select in httplib's (py 1.5.x) getreply:
import select
(r,w,e) = select.select([self.sock],[],[],30.0)
if r:
line = self.file.readline()
else:
raise socket.error, "select timeout"
don't know how correct this is, but it worked for us.
msoulier at storm.ca (Michael P. Soulier) wrote in message news:<Ttzj7.58472$n75.14648351 at news4.rdc1.on.home.com>...
> Hey people.
>
> I'm writing a web crawler as an exercise, using urllib and htmllib to
> recursively crawl through the pages. Whenever urllib.urlopen() throws an
> IOError exception the url gets flagged as a broken link.
>
> Unfortunately, urllib.urlopen() is blocking for some time on one URL. When
> I do an nslookup on it, it times out within a few seconds, since it's a URL
> from our internal intranet at work and is not accessible from the internet.
> However, urllib.urlopen() takes forever to return.
>
> Is there a way to specify a timeout for this library? I can't find a way
> in the documentation.
>
> Thanks,
>
> Mike
More information about the Python-list
mailing list