blocking forever with urllib

Fri Aug 31 00:56:35 EDT 2001

i ran into this problem when some web servers will accept
the connection but just wait and do nothing.  to fix this we
added a select in httplib's (py 1.5.x) getreply:
        import select
        (r,w,e) = select.select([self.sock],[],[],30.0)
        if r:
            line = self.file.readline()
        else:
            raise socket.error, "select timeout"
don't know how correct this is, but it worked for us.

msoulier at storm.ca (Michael P. Soulier) wrote in message news:<Ttzj7.58472$n75.14648351 at news4.rdc1.on.home.com>...
> Hey people. 
> 
>     I'm writing a web crawler as an exercise, using urllib and htmllib to
> recursively crawl through the pages. Whenever urllib.urlopen() throws an
> IOError exception the url gets flagged as a broken link. 
> 
>     Unfortunately, urllib.urlopen() is blocking for some time on one URL. When
> I do an nslookup on it, it times out within a few seconds, since it's a URL
> from our internal intranet at work and is not accessible from the internet.
> However, urllib.urlopen() takes forever to return. 
> 
>     Is there a way to specify a timeout for this library? I can't find a way
> in the documentation. 
> 
>     Thanks,
> 
>     Mike