Concurrent threads to pull web pages?

exarkun at twistedmatrix.com exarkun at twistedmatrix.com
Fri Oct 2 11:09:23 EDT 2009


On 05:48 am, wlfraed at ix.netcom.com wrote:
>On Fri, 02 Oct 2009 01:33:18 -0000, exarkun at twistedmatrix.com declaimed
>the following in gmane.comp.python.general:
>>There's no need to use threads for this.  Have a look at Twisted:
>>
>>   http://twistedmatrix.com/trac/
>
>         Strange... While I can easily visualize how to convert the 
>problem
>to a task pool -- especially given that code to do a single occurrence
>is already in place...
>
>         ... conversion to an event-dispatch based system is something 
>/I/
>can not imagine...

The cool thing is that there's not much conversion to do from the single 
request version to the multiple request version, if you're using 
Twisted.  The single request version looks like this:

    getPage(url).addCallback(pageReceived)

And the multiple request version looks like this:

    getPage(firstURL).addCallback(pageReceived)
    getPage(secondURL).addCallback(pageReceived)

Since the APIs don't block, doing things concurrently ends up being the 
easy thing.

Not to say it isn't a bit of a challenge to get into this mindset, but I 
think anyone who wants to put a bit of effort into it can manage. :) 
Getting used to using Deferreds in the first place (necessary to 
write/use even the single request version) is probably where more people 
have trouble.

Jean-Paul



More information about the Python-list mailing list