[Mailman-Users] Problem with qrunner and too much incoming mail
marc_news at valinux.com
Fri Nov 3 20:35:39 CET 2000
[I am not Ccing mailman-developers as this is not encouraged, but if someone
on both lists thinks it should be forwarded there, please feel free]
As some of you may know, sourceforge.net's mailing lists run on mailman.
With the upgrade to the 2.0 branch, when deliveries where switched from
being directly handed out to the MTA to being spooled, and picked up by
qrunner, more mail started getting spooled than qrunner could process per
The problem is due to qrunner being single threaded by default and having a
global lock. Because some mailing lists have subscribers in domains where
DNS is slow and unreliable, the MTA will hang on those rcpt to until DNS
resolves or timeouts, and qrunner won't be done in time.
After that, it's all downhill from there, more mail queues up, qrunner falls
even further behind, etc, etc...
We're currently playing with MTAs to optimize this a bit, but the real fix
is on the mailing list side.
- Forget about qrunner and switch back to direct delivery and queueing only
when direct delivery fails. Unfortunately, I'm told this is buggy, and
mail can be lost. Is this still true?
- Remove the locking in qrunner, run more than one qrunner at once, and hope
for the best ;-)
- Have a multithreaded qrunner that processes 10 or 20 mails at once
(talking to 10 or 20 instances of the MTa in parrallel)
My understanding is that python 2.0 has multithreading support and that
mailman has some multithreading support. Is it something that could help
me and that we should be looking at?
Thanks for your help, we have to fix this somehow or switch MLMs :-)
(or get killed by our users :-D)
Something else I'm looking at is load balancing.
One solution is to put X lists on each machine, but if you lose one machine,
you lose a portion of your lists.
Now, if I have X machines that mount /var/local/mailman, they'd be able to
service all the lists (config.db would get locked correctly), but I'd still
be stuck with only one queue runner because of the global lock.
That said, I *could* have mailman/data and mailman/qfiles be a symlink to
somewhere on the local disk, and patch qrunner to put its lock in data.
This would allow for independant queue runners, but shared list configs and
shared locks on the list configs themselves.
Would that work?
Am I insane? :-)
Microsoft is to operating systems & security ....
.... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | Finger marc_f at merlins.org for PGP key
More information about the Mailman-Users