What's the cost of using hundreds of threads?
P.Rozycki at elka.pw.edu.pl
Wed Mar 2 12:28:30 CET 2005
> I'm a bit confused by your math. Fifty connections should be 102
> threads, which is quite reasonable.
My formula applies to one forwarded ('loadbalanced') connection. Every
such connection creates further n connections (pipes) which share the
load. Every pipe requires two threads to be spawned. Every 'main
connection' spawns two other threads - so my formula: 2*pipes+2 gives
the number of threads spawned per 'main connection'.
Now if connections_count connections are established the thread count
conn_count * threads_per_main_connection = conn_count * (2*pipes+2)
For 50 connections and about 10 pipes it will give 1100 threads.
> My experience with lots of threads dates back to Python 1.5.2, but I
> rarely saw much improvement with more than a hundred threads, even for
> heavily I/O-bound applications on a multi-CPU system. However, if your
> focus is algorithmic complexity, you should be able to handle a couple of
> thousand threads easily enough.
I don't spawn them because of computional reasons, but due to the fact
that it makes my code much more simpler. I use built-in tcp features to
achieve loadbalancing - every flow (directed through pipe) has it's own
dedicated threads - separate for down- and upload. For every 'main
connection' these threads share send and receive buffer. If any of pipes
is congested the corresponding threads block on their send / recv
functions - without affecting independence of data flows.
Using threads gives me VERY simple code. To achieve this with poll /
select would be much more difficult. And to guarantee concurrency and
maximal throughput for all of pipes I would probably have to mirror code
from linux TCP stack (I mean window shifting, data acknowlegement,
retransmission queues). Or perhaps I exaggerate.
More information about the Python-list