Re: [Mailman-Developers] Huge lists
On Thu, 25 May 2000 00:29:48 -0700 Chuq Von Rospach <chuqui@plaidworks.com> wrote:
At 12:05 AM -0700 5/25/2000, J C Lawrence wrote:
Its tough to image a situation where my time and effort in replacing them (as a solo effort) would actually be worth it as versus throwing hardware at the problem or chatting up Wietse & co.
throwing hardware at a problem isn't always possible.
True, its an argument for mid-range problems: problems that aren't so large that you're only effective recourse is optimisation.
but the place where rolling your own internal MTA starts becoming useful is when the list is big enough that the disk I/O involving the MTA starts becoming the significant limiter.
Arguably disk IO is the only limiting factor in an MTA which you have total control over. Everything else, network latency, DNS, remote MTAs etc etc etc are out of the hands of the MTA author.
This is one of the reasons that CP uses solid state disks for their mail spools. Even with QMail's quite optimised disk IO, it still hurts.
VERP exacerbates the problem, since # of batches sent to the MTA equals the # of addresses, which explodes the number of control files, which...
Exactly, which is why i proposed in the asynch model that the handler process only feed mesages to the MTA at no more than a set rate. That way the local SysAdm can set the rate to something that his MTA is likely to be able to more or less keep up with (or at least not fall too far behind on).
Tough balancing point there. Too fast and you drown the MTA. Too slow and you starve and delivery time is needlessly extended. Getting it right all the time is impossible due to remote delivery problems being unpredictable and the absence of any feedback mechanisms between a fully abstracted MLM and the MTA.
This is one of the ways in which EZMLM has it easy if what I've been told is correct. I've been told (haven't checked) that it kinda fakes VERP, not exploding the spool, but build the messages with the VERP contents only upon MX connection. Prior to that its just a template with a long attached address list. That makes it really cheap. But then again, that's due to the fact that EXMLM is explicitly tied to QMail...
So at some point, it makes sense to deliver direct to recipient rather than build batches into the MTA, and completely avoid the disk I/O and deliver right out of the database to the receiving SMTP client. You could strongly parallelize the delivery setup because you'd do away with all of the MTA overhead, and do all sorts of fun things, like prioritize your delivery sorting and the like.
Yeah, all sorts of fun things, like, in essence, writing the entire delivery and queue handling side of a normal MTA, with the single extention of doing last-second runtime generation of VERP messages qt the instant you send them down the wire (failed delivery? trash the temp VERP message and leave that address as "undelivered").
Yep, you'll need that for very large mailing lists (eg >1M subscribers and >1 post per hour). But, I really don't think that's Mailman's audience. That's LServe's audience. There are damned few lists out there that big...
Which, if you're trying to deliver 5,000,000 emails a day and do so within a time-sensitive time period gets important -- and for the other 99.5% of the universe, just doesn't matter that much (snork).
Bingo.
True. Were Mailman asynchronous, a pattern as below would seem useful:
There is never more a single "queued message handler" process (maybe multi-threaded, or not). That process guarantees not to feed messages to the MTA any faster than XXX messages per second/minute, and to stop such feeding were system load to rise above ZZZ. The single instance rule prevents multiple handler processes for multiple mailing lists maxxing out the MTA as they all dump simultaneously.
Above quoted only to provide reference to the earlier text on queue handling.
Oh, queuing theory is such fun. I got into computers to AVOID math...
Heck, I got out of math because I was too damned lazy...
The problem of multiple list servers (boxes) dumping simultaneously to a remote MTA is properly, I believe, outside of Mailman's purview.
I don't see a value in trying to monitor MTA queue size. Too MTA specific.
See the disk I/O issues above. In a perfect world, the MTA would self-throttle itself to avoid overload conditions. In practice, you have to be careful to both tune the MTA to maximize output, and the MLM to avoid blowing it out. If you have a burst that stuffs 2500 batches into a sendmail queue all at once, then sendmail has that big directory problem i a big way, and your system goes to hell.
While true, that's a sticky wicket. Mailman is MTA agnostic (a Good Thing). Yup, you can build all sorts of intelligence and queue handling techniques in, but the more you do there, the less and less clean (or possible) your abstractions are going to be, and the more you are going to tie yourself into a specific configuration.
We could, without too too much effort, whack Mailman into being, say, Postfix specific and actively conspiring at runtime with Postfix for optimal queue delivery yada yada whoopdedoo yabba yabba and we'd get something that could potentially easily handle a million subscriber list with a post an hour.
But we don't want to do that. EZMLM has already done that. Last I checked there was a nascent project to a Postfix-specific MLM ala EZMLM (never noticed if it got off the ground).
I wonder how much of this could be driven out of something like Midgard? But loading your entire archives into a database gives you the ability to do all sorts of interesting linking and searching and stuff, and "all" you'd need is some email->XML converter, and then...
I'm already kinda busy on another side of this:
http://www.kanga.nu/archives/Meta-L/2000Q2/msg00242.html http://www.kanga.nu/archives/Meta-L/2000Q2/msg00246.html
Oh, man. We need to at least pretend to be on topic for this list, but I need a white board and a pen... (scribbly scribble...)
Gods, I need three of me.
-- J C Lawrence Internet: claw@kanga.nu ----------(*) Internet: coder@kanga.nu ...Honorary Member of Clan McFud -- Teamer's Avenging Monolith...
At 7:38 PM -0700 5/25/2000, J C Lawrence wrote:
While true, that's a sticky wicket. Mailman is MTA agnostic (a Good Thing).
Yes.
Yup, you can build all sorts of intelligence and queue handling techniques in, but the more you do there, the less and less clean (or possible) your abstractions are going to be, and the more you are going to tie yourself into a specific configuration.
So -- let's don't. Instead, use some simple throttle values (number of parallel threads delivering into the MTA would do 90% of the job, but some #addrs/period-of-time throttle might not be a bad idea, but simply saying "the MTA can handle so many parallel deliveries" does most of the job.
The rest, which is how to set up your MTA and tune the MLM's connection to it, then boils down to a set of READMEs or HOWTOs. Keep it out of the code completely.
We could, without too too much effort, whack Mailman into being, say, Postfix specific and actively conspiring at runtime with
And if someone wants that, if we make the interface modular and with a published interface, they can write one and submit it for a future release... (grin)
I realize the last couple of days we've brought forward a lot of neat stuff and then decided it's best NOT to do it, but sometimes the best thing you can do to make a project work is define what it's NOT, so you can focus on what it is. And put everything else down in the TODO for a future generation to wonder about.
-- Chuq Von Rospach - Plaidworks Consulting (mailto:chuqui@plaidworks.com) Apple Mail List Gnome (mailto:chuq@apple.com)
And they sit at the bar and put bread in my jar and say 'Man, what are you doing here?'"
participants (2)
-
Chuq Von Rospach -
J C Lawrence