Mailman 3 Re: [Mailman-Developers] (no subject) - Mailman-Developers

12 Dec 2000


      On Mon, 11 Dec 2000 23:43:07 -0800
Chuq Von Rospach chuqui@plaidworks.com wrote:
...
At 11:15 PM -0800 12/11/00, J C Lawrence wrote:
...
...
My intent so far is just "deliver no more than N mesages per
minute" per outbound aueue runner.  It knocks the peaks off the
problem, and the base structure ie easy to extend from there (and
I don't want to think about that now).
...
and leaves it up to the admin to tune. That's probably fine for
3.0. full queue watching adn self-throttling can wait.  it's nice
to have, but we probably shouldn't try to do everything at
once. Just to leave the hooks for later...
Precisely.
...
...
I should note that my base design is very heavy in terms of
process forks (which happen to be quite light weight under Linux,
but that's another matter).
...
There are definitely places for threads, but to be honest, I see
some tendency of people to go thread-happy. it's the "new puppy",
so everything needs to be designed around threads... Given the
amount of I/O we have going on, the fork overhead is going to get
lost in the noise in most cases.
That's my hope.
...
...
There's a directory full of scripts/programs.
Run them all, in directory sort order, on this message to
determine if we should do XXX with it.
...
and who does this? this missing core policeman process, of course
(grin).
Nope.  The individual process which somehow got nominated for
picking up a message sitting in a list pending queue.  So, it picks
up the mesasges, asks for its distribution list, gets it, and shoves
them both over into the outbound queue.  Later some arbitrary
outbound queue processor wins/gets control of that message, opens an
SMTP session, and shovels the message down to the list of RCPT TOs.
Nobody is responsible for more than their tiny area of the field.
There is a pseudo orchestra leader, but all he really does is fork
processes that go see if there is anything in the queues to process,
and if so, start on them.
...
but -- I'd suggest against this approach. There are problems. to
start, the approach is pretty darn I/O heavy. you'd be better off
loading all of this stuff into an internal database, and making it
a memory-resident table, not a disk-based.
Kinda tough for LDAP or SQL where the list of membersi is dynamic
and depends on the message itself (non-traditional lists).
But yes, it hurts.  The default case will be some sort of
local/cheap DB with a single process.  The idea is that the above
architecture is there should it be needed
...
Administratively, it has some issues as well, since you're more or
less requiring that someone with a CLI deal with a lot of the
configuration -- or opening you up to all sorts of web-based
attacks.
Semi.  The idea is that the CLI guy installs the base set of scripts
that are potentially available for to a given list.  The list owner
then picks from that library for his list, and assmbles and orders
them (building a symlink table on dist) via his web interface (drop
and combo boxes).
...
Instead, you store scripts, and the CLI admin manages that
process, but configuration is within Mailman, and web based.
Precisely.
...
i've been working on a new API for the for the
moderator/autobounce/admin/anti-spam stuff. I'll post that in a
day or so, what I have, because I think the way I'm putting it
together is relevant to how I think the overall control system
could be done.
I haven't really thought about bounce processing at all yet.
...
You want to embed nothing (IMHO), because it reduces the
complexity of all of the pieces and ofrces you to keep the
interfaces clean and rigourous.
Yeah.
...
...
I don't see the different queues needing markedly different
designs, but needing to be able to have their processes supports
cleanly divisible.  The base structures end up markedly similar
after that.
...
Other than, say, imagining a system wher earchives are on a
different machine (or two), and the search engine on a third (or
fourth), so you want to be able to distribute the processing
cleanly.... And the realization that archives and digest stuff can
be held into a low-priority queue and turned into idle-time
processing tasks. A big plus if you've got a busy system a little
closer to the edge than you like.
I haven't thought about system load sensitivities yet, but I don't
see any innate reason they couldn't be another variable thrown into
the, "What am I currently allowed to process" equation.
...
...
Process fork overhead is a problem I've not confronted yet.
...
And I wouldn't worry about it much.  don't think it's going to be
a problem, other than in the MLM->MTA interface where you might be
doing a lot of spawning and forking to parallelize, VERP, or
whatever.
My idea for VERP is trivially simple:
The member script which generate the list of RCTP TOs which are
attached to a pending message will periodically add a second token
(a hash value) after the email address, seperated by whitespace.
Note: instead of text a DMB would work just as well, perhaps
better.
The process that then picks up a message from outbound notices the
hash token and constructs a special envelope for that address
only, using the hash string as +suffix to the envelope return
address.
Want VERP all the time?  Members always generates hash values.  Or
just a percentage of the time, or as a function of how long it was
since we last caught a bounce from that address, or as a function of
how much we like that domain.
The idea is that VERPed messages are built on the instant of handing
them off to an MTA.
...
And that can be minimized and avoided with some careful design. In
the rest of the system, don't bother. When I'm talking about
lightweight, I was meaning code compleixity and feature creep. You
want to stuff as much into external code pieces that are brought
in via queueing and messagings, and keep it out of the control
piece.
Bingo.
...
...
BTW I'd like to have the MLM archive messages such that a member
can request, "SEND ME POST XXX" and have the MLM send it to him.
Ditto for digests.  This is in addition to any web archiving.
...
and another flavor of digest, what I call the HTML-TOC. Simply a
message full of digest info (poster, subject, maybe the first
couple of lines), and a URL to pull it out of archives. Some folks
want a digest to skim, some folks only want header data -- so why
send all those bytes that won't be read?
Ahh, excellant point, Digest really should be an OOB process handled
by their own queue.  Yup.  Absolutely.
--
J C Lawrence                                       claw@kanga.nu
---------(*)                        : http://www.kanga.nu/~claw/
--=| A man is as sane as he is dangerous to his environment |=--

Re: [Mailman-Developers] (no subject)

J C Lawrence

tags

participants (1)