Tips on Speeding up Python Execution

Chris Angelico rosuav at gmail.com
Fri Apr 8 03:25:20 EDT 2011


On Fri, Apr 8, 2011 at 5:04 PM, Abhijeet Mahagaonkar
<abhijeet.manohar at gmail.com> wrote:
> I was able to isolate that major chunk of run time is eaten up in opening a
> webpages, reading from them and extracting text.
> I wanted to know if there is a way to concurrently calling the functions.

So, to clarify: you have code that's loading lots of separate pages,
and the time is spent waiting for the internet? If you're saturating
your connection, then this won't help, but if they're all small pages
and they're coming over the internet, then yes, you certainly CAN
fetch them concurrently. As the Perl folks say, There's More Than One
Way To Do It; one is to spawn a thread for each request, then collect
up all the results at the end. Look up the 'threading' module for
details:

http://docs.python.org/library/threading.html

It should also be possible to directly use asynchronous I/O and
select(), but I couldn't see a way to do that with urllib/urllib2. If
you're using sockets directly, this ought to be an option.

I don't know what's the most Pythonesque option, but if you already
have specific Python code for each of your functions, it's probably
going to be easiest to spawn threads for them all.

Chris Angelico
Threading fan ever since he met OS/2 in 1993 or so



More information about the Python-list mailing list