Fri Nov 5 16:54:44 CET 2004

> I need to download 150000 files several times every day.
> How would you solve that?

Best bet is to use an asynchronous socket library (asyncore / asynchat / twisted 
/ etc). On Linux it's fairly easy to increase the number of file descriptors 
allowed per process, and if you manage your connections appropriately you can 
have thousands of simultaneous open connections.

Elsewhere I noticed that you said the average object size is only around 15k, so 
  once you have the basic system working, you can probably improve performance 
by focusing on things that lower transaction overhead - DNS caching, reusing 
connections to servers (and pipelining HTTP requests to those servers), etc., 
but I wouldn't bother with that right away - better to get the core of the 
stable and scalable first.


