Do I have to use threads?
Gary Herron
gherron at islandtraining.com
Wed Jan 6 02:29:20 EST 2010
aditya shukla wrote:
> Hello people,
>
> I have 5 directories corresponding 5 different urls .I want to
> download images from those urls and place them in the respective
> directories.I have to extract the contents and download them
> simultaneously.I can extract the contents and do then one by one. My
> questions is for doing it simultaneously do I have to use threads?
>
> Please point me in the right direction.
>
>
> Thanks
>
> Aditya
You've been given some bad advice here.
First -- threads are lighter-weight than processes, so threads are
probably *more* efficient. However, with only five thread/processes,
the difference is probably not noticeable. (If the prejudice against
threads comes from concerns over the GIL -- that also is a misplaced
concern in this instance. Since you only have network connection, you
will receive only one packet at a time, so only one thread will be
active at a time. If the extraction process uses a significant enough
amount of CPU time so that the extractions are all running at the same
time *AND* if you are running on a machine with separate CPU/cores *AND*
you would like the extractions to be running truly in parallel on those
separate cores, *THEN*, and only then, will processes be more efficient
than threads.)
Second, running 5 wgets is equivalent to 5 processes not 5 threads.
And third -- you don't have to use either threads *or* processes. There
is another possibility which is much more light-weight: asynchronous
I/O, available through the low level select module, or more usefully
via the higher-level asyncore module. (Although the learning curve
might trip you up, and some people find the programming model for
asyncore hard to fathom, I find it more intuitive in this case than
threads/processes.)
In fact, the asyncore manual page has a ~20 line class which implements
a web page retrieval. You could replace that example's single call to
http_client with five calls, one for each of your ULRs. Then when you
enter the last line (that is the asyncore.loop() call) the five will be
downloading simultaneously.
See http://docs.python.org/library/asyncore.html
Gary Herron
More information about the Python-list
mailing list