Do I have to use threads?

Jorgen Grahn grahn+nntp at snipabacken.se
Fri Jan 8 09:21:38 EST 2010


On Wed, 2010-01-06, Gary Herron wrote:
> aditya shukla wrote:
>> Hello people,
>>
>> I have 5 directories corresponding 5  different urls .I want to 
>> download images from those urls and place them in the respective 
>> directories.I have to extract the contents and download them 
>> simultaneously.I can extract the contents and do then one by one. My 
>> questions is for doing it simultaneously do I have to use threads?
>>
>> Please point me in the right direction.
>>
>>
>> Thanks
>>
>> Aditya
>
> You've been given some bad advice here.
>
> First -- threads are lighter-weight than processes, so threads are 
> probably *more* efficient.  However, with only five thread/processes, 
> the difference is probably not noticeable.    (If the prejudice against 
> threads comes from concerns over the GIL -- that also is a misplaced 
> concern in this instance.  Since you only have network connection, you 
> will receive only one packet at a time, so only one thread will be 
> active at a time.   If the extraction process uses a significant enough 
> amount of CPU time

I wonder what that "extraction" would be, by the way.  Unless you ask
for compression of the HTTP data, the images come as-is on the TCP
stream.

> so that the extractions are all running at the same 
> time *AND* if you are running on a machine with separate CPU/cores *AND* 
> you would like the extractions to be running truly in parallel on those 
> separate cores,  *THEN*, and only then, will processes be more efficient 
> than threads.)

I can't remember what the bad advice was, but here processes versus
threads clearly doesn't matter performance-wise.  I generally
recommend processes, because how they work is well-known and they're
not as vulnerable to weird synchronization bugs as threads.

> Second, running 5 wgets is equivalent to 5 processes not 5 threads.
>
> And third -- you don't have to use either threads *or* processes.  There 
> is another possibility which is much more light-weight:  asynchronous 
> I/O,  available through the low level select module, or more usefully 
> via the higher-level asyncore module.

Yeah, that would be my first choice too for a problem which isn't
clearly CPU-bound.  Or my second choice -- the first would be calling
on a utility like wget(1).

/Jorgen

-- 
  // Jorgen Grahn <grahn@  Oo  o.   .  .
\X/     snipabacken.se>   O  o   .



More information about the Python-list mailing list