Python Thread Question

Anand B Pillai abpillai at lycos.com
Thu Apr 17 09:34:11 EDT 2003


Hi Pythonistas

 I have written this application which is a kind of intranet
 web-spider. It crawls a given url and retrives the files in
 the url and saves it to the disk. 

 Now when I do this using multiple threads(python threads), 
 assigning each url to a thread I find that the download gets
 completed faster than if it were in a single thread. I assume
 that the reason for this must be simple, that when you use 
 a single thread idiom, the app has to wait till a file is 
 downloaded. Whereas if you use a thread for each download, 
 the app can spawn other threads for other downloads, so no 
 wait is needed. I am firing off a group of threads (limited
 by a maxthread count) and pooling them in a threadgroup. 
 Once the threads are fired for download, the app does not
 try to control them until they finish/killed or a network 
 time-out occurs.

 Ideally speaking, multithreading need not improve the speed
 of an application but in examples like this which involve
 bottlenecks like network traffic, it does. My questions about
 this are:

 1. Does python threads work only if the native platform supports
    threading ?  i.e, is python firing 'C' threads which in turn
    fire the platform API threads (Win32 for windows/ pthreads for
    linux etc)?
 2. Can a software API (Win32/pthreads) do multithreading even if
    the CPU does not support multithreading ? (might seem like a
    superfluous question when almost all cpus does in this age, but
    the question is still valid). Or is multithreading ultimately
    related to how the CPU handles threads ?
 3. Is the apparent increase in speed in my program using multiple
    threads attributable to the CPU or the platform API or python ?

 4. Can I safely say that multithreading will improve my application
    performance if it has similar work to do on many resources at the
    same time ? (egs: a web parser/ spider/ a disk-to-disk file copier/
    directory synchronizer) Or does it depend upon the nature of the
    task at hand ?

Well, that is quite a lot.

Thanks for your help,

Regards

Anand Pillai




More information about the Python-list mailing list