[medusa] any performance info?

Tue, 23 Nov 1999 21:11:07 -0800 (PST)

Stuart Zakon writes:
> Hi, I am new to Medusa and have been intrigued by the difference in
> the asyncore model from the multi-thread model I've worked with for
> 10 years. From what I understand the model works well when the
> units of work can be broken into relatively uniform durations.
> Is the unit of work an HTTP exchange?

The base 'unit' might be considered the http request, at least that's
the way the http server in the distribution is designed. However, the
actual I/O is done 'on demand'... a threaded perspective might say
that the I/O is handled in its own thread[s].

Some systems based on Medusa take different approaches - for example
Zope uses a thread pool to handle requests, but lets medusa handle I/O 
asynchronously. This frees up the precious resource of threads to do
actual work, and lets the back end handle all those slow clients.

> Are there performance comparisons of Medusa's asyncore model with web
> servers using multi-threaded models? Under heavy load?

The very best web servers (which in general provide performance waaaay
over what the average user needs) combine the two approaches; for
example using one thread per CPU, each doing event-driven i/o.

Here are a couple of good references:

http://www.acme.com/software/thttpd/benchmarks.html
http://www.kegel.com/c10k.html

Medusa can't handle the kind of loads that thttpd and Zeus can - for
example, the file delivering code uses stdio, and... well, it *is*
written in Python. 8^) But most people don't need 1000
hits/sec. [let's see.. that's 86 million hits/day?]

One big problem with the unix select/poll model is that it *does* get
inefficient once the # of connections gets large... it would be nice
if unix were to add something like completion ports. I think there
are folks adding an equivalent facility to Linux 2.3 [of course, it
has to be completely different, how embarrassing to copy something
from NT. 8^)]

> Also, I heard that egroups uses the asyncore library to achieve
> scalability. Is this so? Does egroups use Medusa?

We have several scalable systems using Medusa for various things -
usually we have a custom system built on top of asyn{core,chat} which
uses the rest of medusa for a web interface.

Probably the biggest 'show-off' piece is our mailing-list exploder,
which juggles up to 10,000 SMTP client connections simultaneously.
[actually, that's not the maximum it's capable of, we're just not fond 
of pushing the OS to the breaking point. 8^) The servers are
configured with an FD_SETSIZE of 16K] These servers are capable of
pushing 6-8 million msgs/day.

We also have customized RPC servers, proxies, smtp servers, and other
things based on the async lib.

eGroups does not use Medusa for the front-end, though - all of the UI
code is written in straightforward Python, and is not amenable to the
async model [all our UI programmers would have to know how to do nasty
callbacks, state machines, etc... to pull this off]

Lately we have been migrating some of our back-end systems toward a
coroutine-based solution. This combines the scalability/efficiency of
async I/O with the ability to program straight-line code. It feels a
lot like a multi-threaded system, but without the need to use locks.
[some of our engineers call it 'co-operative multitasking'] We're in
the process of releasing this code to the public.

-Sam