[Mailman-Developers] Re: [Mailman-Users] Problem with qrunner and too much incoming mail

Sat, 4 Nov 2000 10:29:15 -0800

>  > Once a connetion to the MX is established, bulkmail would then
>>  just start delivering messages to it until the bin was emptied.
>>  Any i/o blocks in any of the processes will allow async* to switch
>>  to a different delivery channel.  We may need to do some explicit
>>  channel management to make sure some are not starved.
>
>Ouch.  I really don't like this idea.

Neither do I. This is actually something that I've looked at long and 
hard in my non-mailman server work. After a fair amount of work and 
research, I finally came to the conclusion that you are MUCH better 
off letting the MTA do the MTA's work, and letting the MLM do the MLM 
work, and once you make the decision that the MLM has to *also* 
become an MTA, you're doing down a road you don't want to travel.

Sendmail, for instance, has many years experience optimizing delivery 
as an MTA. It's a complex, nasty business with lots of subtleties. If 
you're building a list manager, how much work would you need to do to 
get a private delivery system that's as well tuned and efficient as 
sendmail already is? ditto all of the other MTAs.

I've built some prototype systems to test this. Even though (in 
theory) you're adding a layer of delivery and other overhead, it's 
very difficult to come even close to the performance a tuned MTA can 
give you -- and you're writing a lot of code to do it.

One of the systems I've been investigating, for instance, would do 
100% customized mail driven by a template document and pulling data 
out of a database -- with a design parameter of up to 10 million 
deliveries. the goal is at least 500K deliveries an hour, preferably 
double that. Right now, on a system with a sendmail 8.9 base and a 
non-optimized delivery tool, I'm doing 400-450K/hour. I expect to see 
a nice addition when I move to sendmail 8.10.x in a week or so. this 
is on a Sun E250, FWIW, with the sendmail queues living in a ram 
disk. Good sized hardware, but not particularly big or fast hardware.

Instead of reinventing the MTA wheel, I think we're much better off 
coming up with an MTA -> MLM interface that's very flexible and 
highly configurable (most especially in how to deliver and how much 
to parallelize the infeed to the MLM), and then  focus on how to tune 
the MTA and MLM through documentation.

Splitting the inbound and outbound queue would be my first thing 
here, and probably split bounces into a third queue. That's a pretty 
quick, easy optimization that makes sure the end user sees fast 
response without being bogged down by deliveries, and that's a huge 
perception issue. Then focus on parallelizing the delivery from 
mailman into the MTA, and make that configurable so each admin can 
tune it to their system and needs.

>As discussed previously amongst Chuq, Nigel and I, the needs of
>large list server systems are rather different from the normal home
>hobbyest requirements, but are not compleatly alien.  However, the
>needs of very large list installations (cf ListServ, Egroups, or
>SourceForge) are rather different yet again.

This is a basic reality -- things don't scale. Or worse, they scale 
for a while, and then you need to switch paradigms. I found that one 
out the hard way. If someone wants a rhetoric on how to scale mail 
list servers infinitely, I'd be happy to explain how, since I've had 
to develop an architecture to do so. the nice thing is, it can be 
done without exceptional engineering hassles -- but it's not just 
adding another daemon or a faster CPU (although those are solutions 
for parts of it, just not ALKWAYs the solutions)

>  I'm not convinced of
>the value in beating on Mailman to support the (comparitively rare
>if high profile) very large installations when the current (much
>larger and more common Mailman-wise) mid-size realm still needs
>attention.  Certainly, such changes should not detract from
>Mailman's current level of suitability for smaller installations.

I think we can build a Mailman that does this, at least for, oh, 95% 
of the universe out there, and the other 5% are going to have custom 
solutions anyway (or should!). What we don't want to do is screw up 
Mailman for the "typical" user to make it work for the big site; but 
we also don't want Mailman to get a reputation as a "small server 
only" system, because it'll cause people to reject it in 
implementations. Fortunately, I don't think you need to do that. It 
just needs some tweaking.

>support for intermittently connected nodes.  Say something like:
>
>   Cron launches the bulkmailer.
>   The bulkmailer forks N children processing the queue.
>   The bulkmailer exits upon an empty queue.
>   Should cron launch a new bulkmailer when the prevvious incarnation
>     hasn't exited yet, the new instance merely exits immediately.
>
>Locking for the above is fairly simple.  Standard IPCs can be used
>for the instance collision checks.  Locking on the hash queues could
>be a bit intereting from a portability and performance vantage given
>the fact that the list side will be attemptiong to deliver into the
>same tree at the same time that deliveries are happening (no more
>lock collisions please) -- which pretty much requires that locking
>be on the queue-entry level rather than the hash bucket level.  Not
>rocket science, just a bit finnicky.
>
>Will this handle SourceForge?  Probably.

On reasonable hardware, definitely. That's basically how my current 
custom system works. right now, the number of parallel infeeds from 
mailman is 1. I'm willing to bet the delivery MTA is basically idle 
and bored. By moving to parallel infeeds, you can stoke the MTA up to 
speed, and the trick is for each site to figure out what number of 
parallel infeeds will work keeping the queues full for the MTA to 
stay busy without overloading them and causing the MTA to thrash. 
That's simply a case of tuning and modelling. And simply allowing "N" 
infeed threads to the MLM will solve Sourceforge's problem and pretty 
much everyone else's, without having to get into the MTA business, 
where the best we can really hope is to be "as good" as the real MTA.

so my recommendation is:

1) split the current qfiles into three queues: inbound, outbound and bounce

2) parallelize the outbound queue into "N" configurable delivery threads.

3) work on documentation on how to tune this for maximum perforamnce 
with major MTAs,
	and how to tune MTAs for maxiumum performance.

That's a set of pretty easy updates, no technological miracles or 
black boxes, and solves all but the worst problems someone running 
Mailman is likely to see. And for sites this *doesn't* solve, it's 
either because they're doing the 5 pounds in a 1 pound bag thing, or 
they probably need to start hiring people like us to custom build 
someething.

-- 
Chuq Von Rospach - Plaidworks Consulting (mailto:chuqui@plaidworks.com)
Apple Mail List Gnome (mailto:chuq@apple.com)

Be just, and fear not.