How to set a timeout?

pehr anderson pehr at alum.mit.edu
Fri Jul 20 01:43:14 EDT 2001


Dear Matthias,


What you want is to enable timeoutsockets.
When you initiate a tcp connection, it has
a default timeout behavior that keeps the 
connection open for two minutes.  
This makes it difficult to write spiders and
other useful web code like you are doing here.

Timoutsocket allows you to set the timeout period
for new sockets to whatever you want.
Even when the sockets are opened inside the existing
urllib module, they will still be affected by 
timeoutsocket when timeoutsocket is turned on.

These other solutions technically do get you somewhere,
but they leave idle network resources lying around.

http://timo-tasi.org/python/timeoutsocket.py

I think this code will eventually get included in the
standard distro as it makes it much easier for people 
to code spiders and other essential web applications.

	-pehr






Matthias Huening wrote:
> 
> I have a function like this:
> 
> -----
> def read_pages(urllist):
>     res = {}
>     for x in urllist:
>         try:
>             res[x] = urllib.urlopen(x).read()
>         except:
>             pass
>     return(res)
> -----
> 
> I now want the following behaviour: try to catch the webpage for 3 seconds;
> if it takes longer just skip this one and move on to the next. How can I
> achieve this?
> 
> Thanks, Matthias
> 
> PS Other suggestions for speeding up the process of reading lets say 100
> webpages are welcome...



More information about the Python-list mailing list