At 7:51 PM -0800 12/11/00, J C Lawrence wrote:
ObTheme: All config files should be human readable unless those files are dynamically created and contain data which will be easily and automatically recreated.
ObTheme: All configuration should be possible via the web, even if the system is misconfigured and non-functional. Anything that can NOT be safely reconfigured without breaking the system should not be configurable via the web. (in other words, anything you can change, you should be able to change remotely, unless you can break the ssytem. If you cna break the system, you shouldn't be allowed near it trivially...)
Using multiple simultaneous processes/threads to parallelise a given task.
Using multiple systems running parallel to parallelise a given task.
Using multiple systems, each one dedicated to some portion(s) or sub-set of the overall task (might be all working in parallel on the entire problem (lock contention! failure modes!)).
that's my model perfectly, althought I think 2 and 3 are reversed. it's cleaner architecturally to go to divesting and distributing functionality before 'clustering'. In fact, I'm not sure clustering (which I'll use to term multiple mailman systems running in parallel) implies a system really, really large, when you realize that the primary resource eaters (like delivery) can effectively be infinitely distributed. I'm not sure how big a Mailman system you'd need ot require parallelizing the core process, as long as you can divest off other pieces to a farm that could grow without bounds. So maybe we don't need that next (complicated) step, and make it parallelized and distributable for everything except that core control process, but manage the complexity of that control process to keep everyting out of it exect the absolute necessity.
Observation: MLMs are primarily IO bound devices, and are specifically IO bound on output. Internal processing on mail servers, even given crypto authentication and expensive membership generation processes (eg heavy SQL DB joins etc) are an order of magnitude smaller problem than just getting the outbound mail off the system.
some of that is the MUA's problem, actually, but they get tied together. you don't, for instance, want an MLM who will dump 50K pieces of email an hour into the queues of an MUA that can only process 40K...
But in general, you're correct. Especially if you define DNS delays and SMTP protocol delays caused by the receiving machine to be "output" (grin)
Sites with large numbers of lists with large numbers of members (and presumably large numbers of messages per list) are the pessimal case, and is not one Mailman is currently targeting to solve.
but if you define the distribution capabilities correctly, this case is solved by throwing even more hardware at it, and the owners of this pessimal case presumably have a budget for it. If you see someone tryting to run Sourceforge on a 486 and a 128K DSL line, you laugh at them.
Observation: Traffic bursts are bad. Minimally the MLM should attempt to smooth out delivery rates to a given MTA to be no higher than N messages/time.
The obverse of that is that end-users seriously dislike delays, especially on conversational lists. It turns into the old "user expectation" problem -- it's better to hold ALL mail for 15 minutes so users come to expect it than to normally deliver mail in 2 minutes, except during the worst bulges... But in general, the MLM should deliver as fast as it reasonable can without overloading the MUA, which implies some kind of monitoring setup for the MUA, or some user-controlled throttling system. the latter unfortunately, implies teaching admins how to monitor and adjust, a support issue. The former implies writing an interface for every MTA -- a development AND support issue.
20Million messages sitting in the outbound queue), that the MLM will give the MTA the opportunity to try and react intelligently rather than overwhelming it near instantly with all 20M messages dumped in the MTA spool over 30 seconds while the spool filesystem gags.
I will not make comments about qmail. I will not make comments about qmail. I will be good. I will be good. (grin)
- Receipt of message by local MTA
1a) passthrough of message via a security wrapper from MTA to list server... (I think it's important we remember that, because we can't lose it, and it involves a layer of passthrough and a process spawning, so it's somewhat heavyweight -- but indispensable)
- Receipt by list server
- Approval/editing/moderation
- Processing of message and emission of any resultant message(s)
- Delivery of message to MTA for final delivery.
6) delivery of message to non-MTA recipients (the archiver, the
logging thing, the digester, the bounce processor....)
#1 is significant only because we can can rely on the MTA to distinguish between valif list-related addresses and non-list addresses.
although one thing I've toyed with is to give a subdomain to the MLM, and simply pass everything to it (in sendmail terms, using virtusertable to pass @list.foo.bar to mailman@foo.bar). Then you take the MLM out of having to know what lists exist and administrative needs to keep that interface in sync. The downside is it doesn't fit the design of some users (but that can be fixed by education if we can prove why it's better), and you get into having to handle some MTA functions, such as DSN compatible bounce messages. I've more or less decided than when I rewrite my internal corporate mail list, I'll do that rather than generate alias listings (for, oh, 12,000 groups) and teh hassles and overheads of all that. That'll be especially useful if we do waht I hope, which is set it up so the server has no data at all, but authenticates via LDAP to get list information on demand out of the corporate databases. There are some definite advantages to not knowing whether something exists until the need to know exists -- and as Mailman starts edging towards interfacing to non-Mailman data sources for list information, that ability grows in importance.
- is the processesing needed to support other functions that act on messages. The idea is that instead of delivering to the MTA, we have a suite of functions that deliver the message ot whatever needs to process it. Those can be asynchronous and don't need to be as timely as (5), and have different enough design needs that I split them out from the MTA delivery (although traditionally, stuff like digests are managed by doing an MTA transfer out of the MLM and back in to a different program...)
It also assumes that these non-delivery things are separate processes from teh act of making them available to those things, to keep (6) lightweight as possible.
Note: Bounce processing and request processing re not detailed at this point as their rate of occurance outside of DoS attacks is comparitively low and are far cheaper than list broadcasts in general.
and besides, they are basically independent, asynchronous processes that don't need to be managed by any of the core logic, other than handing messages into their queue and making sure they stay running. same with, IMHO, storing messages for archives, storing messages for digests, updating archives, processing digests (but the processed digest is fed back into the core logic for delivery), and whatever else we decide it needs to do that isn't part of the core, time-sensitive code base. (in fact, there's no reason why you couldn't have multiple flavors of these things, feeding archives into an mbox, another archiver into mhonarc or pipermail, something that updates the search engine indexes, and text adn mime digesters... by turning them into their own logic streams with their own queues, you effectivley have just made them all plug-in swappable, because you're writing to a queue, and not worrying about what happens once its there. you merely need to make sure it goes in the right queue, in the approved format.
We don't want an over-arching API, or the attempt to solve the entire peoblem with either one hammer, or one sort of hammer.
I like hammers! My thumb doesn't, not since the divorce, at least...
kewl. good stuff here.
-- Chuq Von Rospach - Plaidworks Consulting (mailto:chuqui@plaidworks.com) Apple Mail List Gnome (mailto:chuq@apple.com)
We're visiting the relatives. Cover us.