[Tutor] Threads

Terry Carroll carroll at tjc.com
Wed Nov 17 19:49:29 CET 2004


Thanks to everyone for the ideas on using threads for my scraper app.

I was unable to sleep last night (my wife snores), so having nothing 
better to do, I got up and played with Python.  I was able to convert my 
serial download program to a threaded app pretty quickly.

My first step was to discard the use of the list of lists, in which I was 
storing the URLs from which to download, in favor of a Queue object, and 
then continuing to process the entries the same way.  Once that was done, 
I found it pretty straightforward to take the consumer part of the program 
and turn it into a thread.

Great results.  The serial approach took about 21 minutes to process 20 
files; basically about a minute to generate the list of files, and then 
about a minute each for all the files.  With my present threaded approach, 
using 4 threads, that's cut down to about 6 minutes: one minute to 
generate the list, and then five minutes for each thread to download five 
files each.  Of course, increasing the number of threads made it even 
faster.  I went up to six, but feel I'm being abusinve if I do more than 
about 4.

I plan to go back and rework the part that generates the queue.  As 
written, it first generates a list of URLs of pages to process, and then 
processes each of those pages; each page in turn has a URL pointing to the 
file I want to download.  I'm going to rework this so that each page is 
processed as soon as identified, rather than identifying all 20, and the 
queue entry is immediately made.  This would allow my consumer threads to 
begin work much earlier, rather than waiting the minute or so to build the 
queue in its entirety first.

By the way, the shutdown method I chose was as we discussed yesterday: add 
an element to the queue with a shutdown flag on it.  When a thread popped 
this element off the queue, it requeued it for the next thread to discover 
and shut down.  Worked like a champ first time.

I'm not a programmer any more, so having something work the first time is 
a pretty big deal for me these days!



More information about the Tutor mailing list