Re: [Mailman-Developers] (no subject)
On Mon, 11 Dec 2000 23:43:07 -0800 Chuq Von Rospach chuqui@plaidworks.com wrote:
At 11:15 PM -0800 12/11/00, J C Lawrence wrote:
My intent so far is just "deliver no more than N mesages per minute" per outbound aueue runner. It knocks the peaks off the problem, and the base structure ie easy to extend from there (and I don't want to think about that now).
and leaves it up to the admin to tune. That's probably fine for 3.0. full queue watching adn self-throttling can wait. it's nice to have, but we probably shouldn't try to do everything at once. Just to leave the hooks for later...
Precisely.
I should note that my base design is very heavy in terms of process forks (which happen to be quite light weight under Linux, but that's another matter).
There are definitely places for threads, but to be honest, I see some tendency of people to go thread-happy. it's the "new puppy", so everything needs to be designed around threads... Given the amount of I/O we have going on, the fork overhead is going to get lost in the noise in most cases.
That's my hope.
There's a directory full of scripts/programs.
Run them all, in directory sort order, on this message to determine if we should do XXX with it.
and who does this? this missing core policeman process, of course (grin).
Nope. The individual process which somehow got nominated for picking up a message sitting in a list pending queue. So, it picks up the mesasges, asks for its distribution list, gets it, and shoves them both over into the outbound queue. Later some arbitrary outbound queue processor wins/gets control of that message, opens an SMTP session, and shovels the message down to the list of RCPT TOs.
Nobody is responsible for more than their tiny area of the field. There is a pseudo orchestra leader, but all he really does is fork processes that go see if there is anything in the queues to process, and if so, start on them.
but -- I'd suggest against this approach. There are problems. to start, the approach is pretty darn I/O heavy. you'd be better off loading all of this stuff into an internal database, and making it a memory-resident table, not a disk-based.
Kinda tough for LDAP or SQL where the list of membersi is dynamic and depends on the message itself (non-traditional lists).
But yes, it hurts. The default case will be some sort of local/cheap DB with a single process. The idea is that the above architecture is there should it be needed
Administratively, it has some issues as well, since you're more or less requiring that someone with a CLI deal with a lot of the configuration -- or opening you up to all sorts of web-based attacks.
Semi. The idea is that the CLI guy installs the base set of scripts that are potentially available for to a given list. The list owner then picks from that library for his list, and assmbles and orders them (building a symlink table on dist) via his web interface (drop and combo boxes).
Instead, you store scripts, and the CLI admin manages that process, but configuration is within Mailman, and web based.
Precisely.
i've been working on a new API for the for the moderator/autobounce/admin/anti-spam stuff. I'll post that in a day or so, what I have, because I think the way I'm putting it together is relevant to how I think the overall control system could be done.
I haven't really thought about bounce processing at all yet.
You want to embed nothing (IMHO), because it reduces the complexity of all of the pieces and ofrces you to keep the interfaces clean and rigourous.
Yeah.
I don't see the different queues needing markedly different designs, but needing to be able to have their processes supports cleanly divisible. The base structures end up markedly similar after that.
Other than, say, imagining a system wher earchives are on a different machine (or two), and the search engine on a third (or fourth), so you want to be able to distribute the processing cleanly.... And the realization that archives and digest stuff can be held into a low-priority queue and turned into idle-time processing tasks. A big plus if you've got a busy system a little closer to the edge than you like.
I haven't thought about system load sensitivities yet, but I don't see any innate reason they couldn't be another variable thrown into the, "What am I currently allowed to process" equation.
Process fork overhead is a problem I've not confronted yet.
And I wouldn't worry about it much. don't think it's going to be a problem, other than in the MLM->MTA interface where you might be doing a lot of spawning and forking to parallelize, VERP, or whatever.
My idea for VERP is trivially simple:
The member script which generate the list of RCTP TOs which are attached to a pending message will periodically add a second token (a hash value) after the email address, seperated by whitespace.
Note: instead of text a DMB would work just as well, perhaps better.
The process that then picks up a message from outbound notices the hash token and constructs a special envelope for that address only, using the hash string as +suffix to the envelope return address.
Want VERP all the time? Members always generates hash values. Or just a percentage of the time, or as a function of how long it was since we last caught a bounce from that address, or as a function of how much we like that domain.
The idea is that VERPed messages are built on the instant of handing them off to an MTA.
And that can be minimized and avoided with some careful design. In the rest of the system, don't bother. When I'm talking about lightweight, I was meaning code compleixity and feature creep. You want to stuff as much into external code pieces that are brought in via queueing and messagings, and keep it out of the control piece.
Bingo.
BTW I'd like to have the MLM archive messages such that a member can request, "SEND ME POST XXX" and have the MLM send it to him. Ditto for digests. This is in addition to any web archiving.
and another flavor of digest, what I call the HTML-TOC. Simply a message full of digest info (poster, subject, maybe the first couple of lines), and a URL to pull it out of archives. Some folks want a digest to skim, some folks only want header data -- so why send all those bytes that won't be read?
Ahh, excellant point, Digest really should be an OOB process handled by their own queue. Yup. Absolutely.
-- J C Lawrence claw@kanga.nu ---------(*) : http://www.kanga.nu/~claw/ --=| A man is as sane as he is dangerous to his environment |=--
participants (1)
-
J C Lawrence