parralel downloads

Gabriel Genellina gagsl-py2 at yahoo.com.ar
Sun Mar 9 22:11:16 CET 2008


En Sat, 08 Mar 2008 14:47:45 -0200, Gary Herron  
<gherron at islandtraining.com> escribi�:

> poof65 wrote:
>> For your problem you have to use threads.
>>
> Not at all true.  Thread provide one way to solve this, but another is
> the select function.  For this simple case, select() may (or may not) be
> easier to write.  Pseudo-code would look something like this:
>
>   openSockets = list of sockets one per download file:
>   while openSockets:
>     readySockets = select(openSockets ...) # Identifies sockets with
> data to be read
>     for each s in readSockets:
>       read from s and do whatever with the data
>       if s is at EOF: close and remove s from openSockets
>
> That's it.  Far easier than threads.

Easier? If you omit all the relevant details, yes, looks easy. For  
example, you read some data from one socket, part of the file you're  
downloading. Where do you write it? You require additional structures to  
keep track of things.
Pseudocode for the threaded version, complete with socket creation:

def downloadfile(url, fn):
   s = create socket for url
   f = open filename for writing
   shutil.copyfileobj(s.makefile(), f)

for each url, filename to retrieve:
   t = threading.Thread(target=downloadfile, args=(url,filename))
   add t to threadlist
   t.start()

for each t in threadlist:
   t.join()

The downloadfile function looks simpler to me - it's what anyone would  
write in a single threaded program, with local variables and keeping full  
state.
The above pseudocode can be converted directly into Python code - no more  
structures nor code are required.

Of course, don't try to download a million files at the same time -  
neither a million sockets nor a million threads would work.

-- 
Gabriel Genellina




More information about the Python-list mailing list