Re: [Mailman-Developers] (no subject)

Dec. 11, 2000

      At 7:51 PM -0800 12/11/00, J C Lawrence wrote:
...
ObTheme: All config files should be human readable unless those
files are dynamically created and contain data which will be easily
and automatically recreated.
ObTheme: All configuration should be possible via the web, even if
the system is misconfigured and non-functional. Anything that can NOT
be safely reconfigured without breaking the system should not be
configurable via the web. (in other words, anything you can change,
you should be able to change remotely, unless you can break the
ssytem. If you cna break the system, you shouldn't be allowed near it
trivially...)
...

Using multiple simultaneous processes/threads to parallelise
a given task.

Using multiple systems running parallel to parallelise a given
task.

Using multiple systems, each one dedicated to some portion(s)
or sub-set of the overall task (might be all working in
parallel on the entire problem (lock contention! failure
modes!)).

that's my model perfectly, althought I think 2 and 3 are reversed.
it's cleaner architecturally to go to divesting and distributing
functionality before 'clustering'. In fact, I'm not sure clustering
(which I'll use to term multiple mailman systems running in parallel)
implies a system really, really large, when you realize that the
primary resource eaters (like delivery) can effectively be infinitely
distributed. I'm not sure how big a Mailman system you'd need ot
require parallelizing the core process, as long as you can divest off
other pieces to a farm that could grow without bounds. So maybe we
don't need that next (complicated) step, and make it parallelized and
distributable for everything except that core control process, but
manage the complexity of that control process to keep everyting out
of it exect the absolute necessity.
...
Observation: MLMs are primarily IO bound devices, and are
specifically IO bound on output.  Internal processing on mail
servers, even given crypto authentication and expensive membership
generation processes (eg heavy SQL DB joins etc) are an order of
magnitude smaller problem than just getting the outbound mail off
the system.
some of that is the MUA's problem, actually, but they get tied
together. you don't, for instance, want an MLM who will dump 50K
pieces of email an hour into the queues of an MUA that can only
process 40K...
But in general, you're correct. Especially if you define DNS delays
and SMTP protocol delays caused by the receiving machine to be
"output" (grin)
...
Sites with large numbers of lists with large numbers of members (and
presumably large numbers of messages per list) are the pessimal
case, and is not one Mailman is currently targeting to solve.
but if you define the distribution capabilities correctly, this case
is solved by throwing even more hardware at it, and the owners of
this pessimal case presumably have a budget for it. If you see
someone tryting to run Sourceforge on a 486 and a 128K DSL line, you
laugh at them.
...
Observation: Traffic bursts are bad.  Minimally the MLM should
attempt to smooth out delivery rates to a given MTA to be no higher
than N messages/time.
The obverse of that is that end-users seriously dislike delays,
especially on conversational lists. It turns into the old "user
expectation" problem -- it's better to hold ALL mail for 15 minutes
so users come to expect it than to normally deliver mail in 2
minutes, except during the worst bulges... But in general, the MLM
should deliver as fast as it reasonable can without overloading the
MUA, which implies some kind of monitoring setup for the MUA, or some
user-controlled throttling system. the latter unfortunately, implies
teaching admins how to monitor and adjust, a support issue. The
former implies writing an interface for every MTA -- a development
AND support issue.
...
20Million messages sitting in the outbound queue), that the MLM will
give the MTA the opportunity to try and react intelligently rather
than overwhelming it near instantly with all 20M messages dumped in
the MTA spool over 30 seconds while the spool filesystem gags.
I will not make comments about qmail. I will not make comments about
qmail. I will be good. I will be good. (grin)
...

Receipt of message by local MTA

1a) passthrough of message via a security wrapper from MTA to list
server... (I think it's important we remember that, because we can't
lose it, and it involves a layer of passthrough and a process
spawning, so it's somewhat heavyweight -- but indispensable)
...

Receipt by list server
Approval/editing/moderation
Processing of message and emission of any resultant message(s)
Delivery of message to MTA for final delivery.

6) delivery of message to non-MTA recipients (the archiver, the 
logging thing,
the digester, the bounce processor....)
...
#1 is significant only because we can can rely on the MTA to
distinguish between valif list-related addresses and non-list
addresses.
although one thing I've toyed with is to give a subdomain to the MLM,
and simply pass everything to it (in sendmail terms, using
virtusertable to pass @list.foo.bar to mailman@foo.bar). Then you
take the MLM out of having to know what lists exist and
administrative needs to keep that interface in sync. The downside is
it doesn't fit the design of some users (but that can be fixed by
education if we can prove why it's better), and you get into having
to handle some MTA functions, such as DSN compatible bounce messages.
I've more or less decided than when I rewrite my internal corporate
mail list, I'll do that rather than generate alias listings (for, oh,
12,000 groups) and teh hassles and overheads of all that. That'll be
especially useful if we do waht I hope, which is set it up so the
server has no data at all, but authenticates via LDAP to get list
information on demand out of the corporate databases. There are some
definite advantages to not knowing whether something exists until the
need to know exists -- and as Mailman starts edging towards
interfacing to non-Mailman data sources for list information, that
ability grows in importance.

is the processesing needed to support other functions that act on
messages. The idea is that instead of delivering to the MTA, we have
a suite of functions that deliver the message ot whatever needs to
process it. Those can be asynchronous and don't need to be as timely
as (5), and have different enough design needs that I split them out
from the MTA delivery (although traditionally, stuff like digests are
managed by doing an MTA transfer out of the MLM and back in to a
different program...)

It also assumes that these non-delivery things are separate processes
from teh act of making them available to those things, to keep (6)
lightweight as possible.
...
Note: Bounce processing and request processing re not detailed at
this point as their rate of occurance outside of DoS attacks is
comparitively low and are far cheaper than list broadcasts in
general.
and besides, they are basically independent, asynchronous processes
that don't need to be managed by any of the core logic, other than
handing messages into their queue and making sure they stay running.
same with, IMHO, storing messages for archives, storing messages for
digests, updating archives, processing digests (but the processed
digest is fed back into the core logic for delivery), and whatever
else we decide it needs to do that isn't part of the core,
time-sensitive code base. (in fact, there's no reason why you
couldn't have multiple flavors of these things, feeding archives into
an mbox, another archiver into mhonarc or pipermail, something that
updates the search engine indexes, and text adn mime digesters... by
turning them into their own logic streams with their own queues, you
effectivley have just made them all plug-in swappable, because you're
writing to a queue, and not worrying about what happens once its
there. you merely need to make sure it goes in the right queue, in
the approved format.
...
We don't want an over-arching API, or the attempt to solve the
entire peoblem with either one hammer, or one sort of hammer.
I like hammers! My thumb doesn't, not since the divorce, at least...
kewl. good stuff here.
--
Chuq Von Rospach - Plaidworks Consulting (mailto:chuqui@plaidworks.com)
Apple Mail List Gnome (mailto:chuq@apple.com)
We're visiting the relatives. Cover us.