
Hi,
Could somebody please send me a config for a large mailman site?
We're seeing our mailman queue wedge fairly frequently. A simple restart sorts this out, but I'd like to see if there's anything else I can do to prevent this.
We have the following relevant options set:
QRUNNERS = [ ('ArchRunner', 4), # messages for the archiver ('BounceRunner', 2), # for processing the qfile/bounces directory ('CommandRunner', 1), # commands and bounces from the outside world ('IncomingRunner', 4), # posts from the outside world ('NewsRunner', 1), # outgoing messages to the nntpd ('OutgoingRunner', 4), # outgoing messages to the smtpd ('VirginRunner', 1), # internally crafted (virgin birth) messages ('RetryRunner', 3), # retry temporarily failed deliveries ]
# Default queue stuff is shite. QRUNNER_LOCK_LIFETIME = hours(20) QRUNNER_PROCESS_LIFETIME = hours(2) QRUNNER_MAX_MESSAGES = 50000 DELIVERY_RETRY_WAIT=minutes(10) # Max recipients for each message SMTP_MAX_RCPTS = 150 # Max messages sent in each SMTP connection SMTP_MAX_SESSIONS_PER_CONNECTION = 25 ARCHIVE_TO_MBOX = 0
Cheers,
George
-- George Barnett Reality Engineer
m: (+44) 797 457 1868 e: george@alink.co.za
Digital circuits are made from analog parts. -- Don Vonada

At 10:31 AM +0000 2006-03-03, George Barnett wrote:
Could somebody please send me a config for a large mailman site?
In my experience, these kinds of things need to be developed
specifically for a given site. SourceForge and lists.apple.com may both be exceptionally large Mailman sites, but their traffic patterns may not be anything remotely similar. And neither of them are likely to be anything like a monster announce-only list with 500k recipients, where announcements are made once a week or once a month.
We're seeing our mailman queue wedge fairly frequently. A simple restart sorts this out, but I'd like to see if there's anything else I can do to prevent this.
What do you mean by "wedge"? Which process(es) is/are failing or
getting "wedged"?
We have the following relevant options set:
QRUNNERS = [ ('ArchRunner', 4), # messages for the archiver ('BounceRunner', 2), # for processing the qfile/bounces directory ('CommandRunner', 1), # commands and bounces from the outside world ('IncomingRunner', 4), # posts from the outside world ('NewsRunner', 1), # outgoing messages to the nntpd ('OutgoingRunner', 4), # outgoing messages to the smtpd ('VirginRunner', 1), # internally crafted (virgin birth) messages ('RetryRunner', 3), # retry temporarily failed deliveries ]
For the mailing lists hosted on python.org (including the
mailman-users mailing list), we have not found it necessary to modify the default QRUNNERS values.
I can't speak for any other Mailman site.
# Default queue stuff is shite. QRUNNER_LOCK_LIFETIME = hours(20) QRUNNER_PROCESS_LIFETIME = hours(2) QRUNNER_MAX_MESSAGES = 50000 DELIVERY_RETRY_WAIT=minutes(10) # Max recipients for each message SMTP_MAX_RCPTS = 150 # Max messages sent in each SMTP connection SMTP_MAX_SESSIONS_PER_CONNECTION = 25 ARCHIVE_TO_MBOX = 0
Again, for the mailing lists for python.org, we haven't found it
necessary to modify any of these values from their defaults set in Defaults.py.
Most of my work in doing performance tuning for python.org has
been within the MTA, and I've tried to make as much of that information available in the FAQ Wizard -- search for "performance".
-- Brad Knowles, <brad@stop.mail-abuse.org>
"Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety."
-- Benjamin Franklin (1706-1790), reply of the Pennsylvania
Assembly to the Governor, November 11, 1755
LOPSA member since December 2005. See <http://www.lopsa.org/>.

Brad Knowles wrote:
What do you mean by "wedge"? Which process(es) is/are failing or getting "wedged"?
The processes dont seem to be failing. New list mail is delivered fine, but what's 'stuck' in the queue doesn't leave until a restart which effectively does a queue run.
For the mailing lists hosted on python.org (including the
mailman-users mailing list), we have not found it necessary to modify the default QRUNNERS values.
These were increased after some queues were getting more full than others. Changing these values did help, but I suspect it's fixed the symptoms rather than the problem.
I can't speak for any other Mailman site.
# Default queue stuff is shite. QRUNNER_LOCK_LIFETIME = hours(20) QRUNNER_PROCESS_LIFETIME = hours(2) QRUNNER_MAX_MESSAGES = 50000 DELIVERY_RETRY_WAIT=minutes(10) # Max recipients for each message SMTP_MAX_RCPTS = 150 # Max messages sent in each SMTP connection SMTP_MAX_SESSIONS_PER_CONNECTION = 25 ARCHIVE_TO_MBOX = 0
Again, for the mailing lists for python.org, we haven't found it
necessary to modify any of these values from their defaults set in Defaults.py.
I found this information from the faq on python.org. Our traffic pattern is very bursty. Lists are small, maybe 25 users per list, but when traffic comes in, it's alot at the same time (monitoring mails).
Most of my work in doing performance tuning for python.org has been
within the MTA, and I've tried to make as much of that information available in the FAQ Wizard -- search for "performance".
MTA has been tweaked to quite some degree.
-- George Barnett Reality Engineer
m: (+44) 797 457 1868 e: george@alink.co.za
There's only one way to have a happy marriage and as soon as I learn what it is I'll get married again. -- Clint Eastwood

At 12:13 PM +0000 2006-03-03, George Barnett wrote:
The processes dont seem to be failing. New list mail is delivered fine, but what's 'stuck' in the queue doesn't leave until a restart which effectively does a queue run.
Ahh. Because Mailman uses the filesystem as a queueing method,
there is a certain FIFO nature to the messages that are processed, and if the queue is deep then new messages that come in won't be processed for a while -- the qrunner is going to be at a certain point in the inode filestructure for the directory and is not going to look at earlier points in the directory even if they are now free and would sort earlier in the process.
I believe that there is some discussion on this in FAQ 6.6.
These were increased after some queues were getting more full than others. Changing these values did help, but I suspect it's fixed the symptoms rather than the problem.
If you haven't already looked at FAQ 6.6, I'd encourage you to
read it and compare it with your own experience, and the different values recommended as compared to your own. Using different values is fine, but you should understand what the values mean and why they are what they are.
One thing I can tell you is that the filesystem is a critical
bottleneck for both MTAs and MLMs, and most of the performance tuning techniques for MTAs with regard to filesystems will be equally applicable to filesystem issues for Mailman and other MLMs.
Towards that end, I'd recommend you take a look at FAQ 6.3 as
well, and the slides at <http://www.shub-internet.org/brad/papers/sendmail-tuning/> from my invited talk "Sendmail Performance Tuning for Large Systems" that I gave at SANE'98. With regards to disk and filesystem performance, pretty much everywhere it says "Sendmail", you can substitute "Mailman" instead, without loss of generality.
I found this information from the faq on python.org. Our traffic pattern is very bursty. Lists are small, maybe 25 users per list, but when traffic comes in, it's alot at the same time (monitoring mails).
Whereas our traffic tends to be more consistent -- receiving
anywhere from 5000 to 30,000 messages per hour, and around 400,000 total incoming messages per day, and rejecting about 200,000 messages per day (mostly spam).
By Mailman standards, this is actually only "moderate" size, even
though a couple of months ago we were rejecting 90-95% of all incoming mail as spam, and handling 500,000-600,000 (or more) incoming messages per day that were legitimate.
Oh, and we've got some stuff we're doing with the firewall
(blocking known abusive IP addresses for short periods of time), which doesn't show up in any of our logs. There's enough of that crap that we can't log that information because the impact of doing the logging would probably kill the machine.
-- Brad Knowles, <brad@stop.mail-abuse.org>
"Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety."
-- Benjamin Franklin (1706-1790), reply of the Pennsylvania
Assembly to the Governor, November 11, 1755
LOPSA member since December 2005. See <http://www.lopsa.org/>.

George Barnett wrote:
We have the following relevant options set:
QRUNNERS = [ ('ArchRunner', 4), # messages for the archiver ('BounceRunner', 2), # for processing the qfile/bounces directory ('CommandRunner', 1), # commands and bounces from the outside world ('IncomingRunner', 4), # posts from the outside world ('NewsRunner', 1), # outgoing messages to the nntpd ('OutgoingRunner', 4), # outgoing messages to the smtpd ('VirginRunner', 1), # internally crafted (virgin birth) messages ('RetryRunner', 3), # retry temporarily failed deliveries ]
I doubt this is the whole problem, but the number of slices must be a power of two even though this isn't checked - i.e., your configuration will be accepted, but there may be issues with the retry queue.
Also, you said later in the thread that processes don't seem to be failing. I assume you've checked both Mailman's 'error' and 'qrunner' logs and also 'ps' when you have a problem to see that all the runners are still there.
-- Mark Sapiro <msapiro@value.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
participants (3)
-
Brad Knowles
-
George Barnett
-
Mark Sapiro