[Mailman-Developers] Huge lists
Chuq Von Rospach
Thu, 25 May 2000 00:29:48 -0700
At 12:05 AM -0700 5/25/2000, J C Lawrence wrote:
>Its tough to
>image a situation where my time and effort in replacing them (as a
>solo effort) would actually be worth it as versus throwing hardware
>at the problem or chatting up Wietse & co.
throwing hardware at a problem isn't always possible. but the place
where rolling your own internal MTA starts becoming useful is when
the list is big enough that the disk I/O involving the MTA starts
becoming the significant limiter. With sendmail 8.9.x, that's fairly
easy to run into. With sendmail 8.10, it seems to be better, and the
multiple queue stuff solves a multitude of problems involving huge
VERP exacerbates the problem, since # of batches sent to the MTA
equals the # of addresses, which explodes the number of control
files, which... So at some point, it makes sense to deliver direct to
recipient rather than build batches into the MTA, and completely
avoid the disk I/O and deliver right out of the database to the
receiving SMTP client. You could strongly parallelize the delivery
setup because you'd do away with all of the MTA overhead, and do all
sorts of fun things, like prioritize your delivery sorting and the
Which, if you're trying to deliver 5,000,000 emails a day and do so
within a time-sensitive time period gets important -- and for the
other 99.5% of the universe, just doesn't matter that much (snork).
> I've written list
>servers and mini-MTAs before. There's a fair bit of hidden
>complexity and brain hurt in there I don't mind avoiding.
Yes, that's very true. Just dealing with MX gets gnarly.
>True. Were Mailman asynchronous, a pattern as below would seem
> There is never more a single "queued message handler" process
> (maybe multi-threaded, or not). That process guarantees not to
> feed messages to the MTA any faster than XXX messages per
> second/minute, and to stop such feeding were system load to rise
> above ZZZ. The single instance rule prevents multiple handler
> processes for multiple mailing lists maxxing out the MTA as they
> all dump simultaneously.
That's basically how my big machine has evolved -- I'm using three
queues, one to generate delivery batches (and requeue them into
queue2), the 2nd queue to paralellize bulk_mailers into the MTA, and
a third queue just for smartbounce and non-delivery batches, to keep
them out of the way... It's nice, because my setup batch can generate
a bunch of batches, and it's up to the queue system to make sure only
"N" of them are running at any time, but any batch that hits slow
domains doesn't back up huge numbers of addresses, the waiting
batches slip into other slots. Oh, queuing theory is such fun. I got
into computers to AVOID math...
> The problem of multiple list servers
> (boxes) dumping simultaneously to a remote MTA is properly, I
> believe, outside of Mailman's purview.
>I don't see a value in trying to monitor MTA queue size. Too MTA
See the disk I/O issues above. In a perfect world, the MTA would
self-throttle itself to avoid overload conditions. In practice, you
have to be careful to both tune the MTA to maximize output, and the
MLM to avoid blowing it out. If you have a burst that stuffs 2500
batches into a sendmail queue all at once, then sendmail has that big
directory problem i a big way, and your system goes to hell.
Sendmail 8.10 goes a long way to minimizing this, but still, you can
force your MTA to thrash, and when you do, everything gets really
unhappy. So perhaps you don't need to have the MLM monitor the MTA
constantly and throttle itself, but that's actually not a bad thing,
IMHO, if it can be done reasonably -- on the other hand, I wouldn't
make it a big focus, since it'd be a LOT easier to write some docs on
how to tune the system adn what to watch out for, and let the admin
do the tuning. Once the tuning is done, it probably won't require a
lot of watching...
> > Well, this is probably preaching to the choir, but I've gotten
>> quite convinced that you isolate every piece you can from every
>> other piece, and document the interfaces. that makes it quite easy
> > to swap out a new piece without affecting the rest of the system
>This is often called, "programming by contract". Its a Good Thing.
Heh. It's also called "breaking a huge project down into tiny pieces
so your customers don't worry nearly as much about deadlines"...
>One of my list members has been advocating WebCrossing. What do you
>think of it?
Not appropriate for this list. Let's talk offline. I'm designing it
out of my systems in favor of other things, but the reasons are
complex -- and I've recommended it INTO at least one major
development project at the same time. So I guess the answer is "it
depends, but I'm not going to be using it myself..."
> > you could do something really nice with PHP and MySQL, too, and
>Yeah, I've thought about that but I really just don't see enough
>advantage to justify the time it would take to get something better
>than I have now.
I wonder how much of this could be driven out of something like
Midgard? But loading your entire archives into a database gives you
the ability to do all sorts of interesting linking and searching and
stuff, and "all" you'd need is some email->XML converter, and then...
Oh, man. We need to at least pretend to be on topic for this list,
but I need a white board and a pen... (scribbly scribble...)
Chuq Von Rospach - Plaidworks Consulting (mailto:firstname.lastname@example.org)
Apple Mail List Gnome (mailto:email@example.com)
And they sit at the bar and put bread in my jar
and say 'Man, what are you doing here?'"