Concurrent threads to pull web pages?

exarkun at twistedmatrix.com exarkun at twistedmatrix.com
Thu Oct 1 21:33:18 EDT 2009


On 1 Oct, 09:28 am, nospam at nospam.com wrote:
>Hello
>
>         I recently asked how to pull companies' ID from an SQLite 
>database,
>have multiple instances of a Python script download each company's web
>page from a remote server, eg. www.acme.com/company.php?id=1, and use
>regexes to extract some information from each page.
>
>I need to run multiple instances to save time, since each page takes
>about 10 seconds to be returned to the script/browser.
>
>Since I've never written a multi-threaded Python script before, to
>save time investigating, I was wondering if someone already had a
>script that downloads web pages and save some information into a
>database.

There's no need to use threads for this.  Have a look at Twisted:

  http://twistedmatrix.com/trac/

Here's an example of how to use the Twisted HTTP client:

  http://twistedmatrix.com/projects/web/documentation/examples/getpage.py

Jean-Paul



More information about the Python-list mailing list