Speeding up network access: threading?
exarkun at twistedmatrix.com
exarkun at twistedmatrix.com
Mon Jan 4 11:52:28 EST 2010
On 04:22 pm, me4 at privacy.net wrote:
>Hello,
>
>what would be best practise for speeding up a larger number of http-get
>requests done via urllib? Until now they are made in sequence, each
>request taking up to one second. The results must be merged into a
>list, while the original sequence needs not to be kept.
>
>I think speed could be improved by parallizing. One could use multiple
>threads.
>Are there any python best practises, or even existing modules, for
>creating and handling a task queue with a fixed number of concurrent
>threads?
Using multiple threads is one approach. There are a few thread pool
implementations lying about; one is part of Twisted,
<http://twistedmatrix.com/documents/current/api/twisted.python.threadpool.ThreadPool.html>.
Another approach is to use non-blocking or asynchronous I/O to make
multiple requests without using multiple threads. Twisted can help you
out with this, too. There's two async HTTP client APIs available. The
older one:
http://twistedmatrix.com/documents/current/api/twisted.web.client.getPage.html
http://twistedmatrix.com/documents/current/api/twisted.web.client.HTTPClientFactory.html
And the newer one, introduced in 9.0:
http://twistedmatrix.com/documents/current/api/twisted.web.client.Agent.html
Jean-Paul
More information about the Python-list
mailing list