parallel programming in Python

Devin Jeanpierre jeanpierreda at gmail.com
Thu May 10 08:46:01 EDT 2012


On Thu, May 10, 2012 at 8:14 AM, Jabba Laci <jabba.laci at gmail.com> wrote:
> What's the best way?

>From what I've heard, http://scrapy.org/ . It is a single-thread
single-process web crawler that nonetheless can download things
concurrently.

Doing what you want in Scrapy would probably involve learning about
Twisted, the library Scrapy works on top of. This is somewhat more
involved than just throwing threads and urllib and lxml.html together,
although most of the Twisted developers are really helpful. It might
not be worth it to you, depending on the size of the task.



Dave's answer is pretty general and good though.

-- Devin



More information about the Python-list mailing list