Speeding up network access: threading?

Mon Jan 4 11:52:28 EST 2010

On 04:22 pm, me4 at privacy.net wrote:
>Hello,
>
>what would be best practise for speeding up a larger number of http-get 
>requests done via urllib? Until now they are made in sequence, each 
>request taking up to one second. The results must be merged into a 
>list, while the original sequence needs not to be kept.
>
>I think speed could be improved by parallizing. One could use multiple 
>threads.
>Are there any python best practises, or even existing modules, for 
>creating and handling a task queue with a fixed number of concurrent 
>threads?

Using multiple threads is one approach.  There are a few thread pool 
implementations lying about; one is part of Twisted, 
<http://twistedmatrix.com/documents/current/api/twisted.python.threadpool.ThreadPool.html>.

Another approach is to use non-blocking or asynchronous I/O to make 
multiple requests without using multiple threads.  Twisted can help you 
out with this, too.  There's two async HTTP client APIs available.  The 
older one:

http://twistedmatrix.com/documents/current/api/twisted.web.client.getPage.html
http://twistedmatrix.com/documents/current/api/twisted.web.client.HTTPClientFactory.html

And the newer one, introduced in 9.0:

http://twistedmatrix.com/documents/current/api/twisted.web.client.Agent.html

Jean-Paul