[Mailman-Users] Call for suggestions

Wed May 9 06:26:26 CEST 2001

On 5/8/01 5:26 PM, "Ashley M. Kirchner" <ashley at pcraft.com> wrote:

>   Generally, two to six at the most.

A few things to check...

Make sure your batch size is small:

SMTP_MAX_RCPTS = 10

Set your qrunner proc to live longer, and extend the lock life:

QRUNNER_LOCK_LIFETIME = hours(10)
QRUNNER_PROCESS_LIFETIME = minutes(15)
QRUNNER_MAX_MESSAGES = 300

Set these to 20 hours, 2 hours and 50000

Here's why: qrunner doesn't process the queue FIFO. Instead, it opens up the
directory and processes the entries sequentially. This implies that if you
get a lot of stuff in the queue, qrunner will quit after 15 minutes and
start over.

As qrunner processes stuff, it deletes those files. So as new stuff comes
in, they get stored as close to the start of the directory inode as
possible, so when qrunner quits and restarts, if it hasn't processed
everything in the directory, it starts over with newer stuff, leaving older
stuff deep into the inode -- and it'll never GET to that stuff deep in the
inode until the system quiets down and it's given a change to catch up. That
means something could literally stay in the queue forever if the system
never slows down enough to allow qrunner to clean up.

By extending qrunner's lifetime, you're allowing it to go much deeper into
the inode. It's STILL not FIFO (and this is why replies get seen before
messages being replied to, and why digests have messages scrambles; Barry
and I have talked at length about this and the queueing should be FIFO in
2.1....), but you're a lot less likely to have a given message buried in the
queue for hours. It doesn't fix the problem -- but significantly (from my
system'ss operations) reduces it, and limits the impact when it does happen.

If you find you still have stuff waiting around -- after you make this
change, if you're still running 2 hours behind, extend it to 3 hours. But be
aware that if for some reason qrunner decides to wedge, you're locking up
your system for longer periods of time. Keeping an eye on things is good.

If you are using sendmail, consider moving to postfix, where you can
configure the system safely for deferring DNS lookups. That significantly
speeds up mailman's ability to queue messages. Sendmail has the same option
(DeliveryMode=defer) but you can't use it, because it disabled anti-spam
checks and turns you into an open relay. Grr. If you're running anything
less than sendmail 8.11 -- upgrade, and spend some time configuring
subdirectories to reduce I/O load and contention in the directory structures
(QueueDirectory=/var/spool/mqueue/q* where * = at least 5 directories, with
appropriate df/qf/xf subdirs).

If you aren't running DNS *on* that local machine, set up a caching-only
name server. You'll see a significant increase in yor performance by
removing the network interactions, even if you're dealing with a DNS server
on your local LAN. 

>> -- going through the
>> bounce queues can clear up a significant amount of processing time, at very
>> little work. You just need to spend time watching the queues and logs and
>> clearing stuff out that gets in the way.
> 
>   I already have to do this.  I have sendmail configured so that it holds

Right here, I'm not talking about sendmail queues, but your mailman qfile
directory, and watching your ~mailman/logs files (especially bounces) for
continuung unresolved bounces. Every bounce you get ends up going through
qrunner -- and mailman won't successfully process all of them, and isn't
very good at telling you that. Lots of stuff disappears silently unless
you're watching the logs, but we haven't figured out how best to report this
stuff without sending so much crap admins throw it out...

>   Has anyone written an administrator's guide to optimizing mailman?

No. No time. But the above ought to help. Maybe once 2.1 comes out and I
have time to beat it up, but since so much is changing between 2.0 and 2.1,
writing one now makes no sense.

My big site is probably one of the top 10 mailman sites -- I may be 2nd
largest after Sourceforge, but I'm not positive. I'm still investigating how
to optimize things, but to be honest, since I'm still running sendmail and
can't use defer mode, it makes no sense to fix anything else until I fix
that, because that is SUCH a huge slowdown it needs to be resolved first,
but that means bringing up and figuring out postfix (I won't use qmail), and
give the size of my installation, I have to be careful about that (I'm
waiting for a machine to come back from the shop, and I'll install it there.
Once I'm comfortable there, it'll go on my small mailman machine. Once I'm
comfortable there, it'll go on my big one... That could take me two weeks,
it could take me two months...)

But there's LOTs that can be done to installations to speed them up before
building clones and slaves and all that stuff. I wouldn't even THINK of
cloning until I had postifx fully optimized, for instance. If you don't
believe me, kill your sendmail daemon, start it up again with "-bd
-ODeliveryMode=defer" and compare how fast a message is queued in defer mode
against your normal sendmail setup. Mind is roughly twice as fast, and since
my primary delay is how fast qrunner can queue, that ONE change will almost
double my capacity -- once I can get it in there safely.

And, from what I can tell, postfix is MUCH faster than sendmail in general.
But beyond that, if you're running sendmail 8.9 or before -- you're wasting
a lot of performance by being downrev. So you have two SIMPLE upgrades (to
sendmail 8.11 with a reconfigure of the queuedirs, then to postfix) that are
a lot less work than all this cloning stuff. And by tuning how your MTA
works, adding a local DNS, tweaking qrunner and clearing queues and fixing
bounces ands tuff, you can really add capacity in an existing machine.

And 2.1, with the multiple queues, will fix the key perfoormance issue with
mailman 2.0, too -- and take advantage of all this other stuff as well. All
without hardware or hacks.