Concurrent threads to pull web pages?

MRAB python at mrabarnett.plus.com
Thu Oct 1 21:46:33 EDT 2009


Gilles Ganault wrote:
> Hello
> 
> 	I recently asked how to pull companies' ID from an SQLite database,
> have multiple instances of a Python script download each company's web
> page from a remote server, eg. www.acme.com/company.php?id=1, and use
> regexes to extract some information from each page.
> 
> I need to run multiple instances to save time, since each page takes
> about 10 seconds to be returned to the script/browser.
> 
> Since I've never written a multi-threaded Python script before, to
> save time investigating, I was wondering if someone already had a
> script that downloads web pages and save some information into a
> database.
> 
> Thank you for any tip.

You could put the URLs into a queue and have multiple worker threads
repeatedly get a URL from the queue, download the page, and then put the
page into another queue for processing by another extraction thread.
This post might help:

http://mail.python.org/pipermail/python-list/2009-September/195866.html




More information about the Python-list mailing list