Concurrent threads to pull web pages?
exarkun at twistedmatrix.com
exarkun at twistedmatrix.com
Fri Oct 2 11:09:23 EDT 2009
On 05:48 am, wlfraed at ix.netcom.com wrote:
>On Fri, 02 Oct 2009 01:33:18 -0000, exarkun at twistedmatrix.com declaimed
>the following in gmane.comp.python.general:
>>There's no need to use threads for this. Have a look at Twisted:
>>
>> http://twistedmatrix.com/trac/
>
> Strange... While I can easily visualize how to convert the
>problem
>to a task pool -- especially given that code to do a single occurrence
>is already in place...
>
> ... conversion to an event-dispatch based system is something
>/I/
>can not imagine...
The cool thing is that there's not much conversion to do from the single
request version to the multiple request version, if you're using
Twisted. The single request version looks like this:
getPage(url).addCallback(pageReceived)
And the multiple request version looks like this:
getPage(firstURL).addCallback(pageReceived)
getPage(secondURL).addCallback(pageReceived)
Since the APIs don't block, doing things concurrently ends up being the
easy thing.
Not to say it isn't a bit of a challenge to get into this mindset, but I
think anyone who wants to put a bit of effort into it can manage. :)
Getting used to using Deferreds in the first place (necessary to
write/use even the single request version) is probably where more people
have trouble.
Jean-Paul
More information about the Python-list
mailing list