[Mailman-Developers] load balancing with mailman.

Barry A. Warsaw barry@zope.com
Tue, 29 Jan 2002 14:40:56 -0500

>>>>> "MM" == Marc MERLIN <marc_news@vasoftware.com> writes:

    MM> You _can_ export ~mailman over NFS. The problem was that with
    MM> linux 2.2 back then, under very high load and lock contention
    MM> (I sent 1000 messages to the same list on the two different
    MM> mail servers to force them to fight over
    MM> ~mailman/lists/listname/config.db).  I was able to find a race
    MM> condition in NFS rename/unlinks which caused 3 messages out of
    MM> the 2000 to bounce.  (I don't know if it's under NFS only, but
    MM> after getting config.db corruption on 3 messages, when mailman
    MM> renames the ocnfig.db.last to config.db, there was a very
    MM> small time window when there was no config.db, and my exim
    MM> with auto list detection failed to stat the config.db, and
    MM> stated that the list didn't exist).

I believe that the algorithm that LockFile uses should be safe across
NFS, modulo system bugs of course.  I haven't run any stress tests
against it in a looong time, but I once did, and don't ever remember
seeing the bug you describe.  I also don't remember the kernel rev,
but it was a 2.2.something I'm sure.  Maybe I was just (un)luckier
than you.

    MM> Barry: With the new qrunner infrastructure, does qrunner still
    MM> need to lock the lists during delivery?  If qrunner doesn't
    MM> modify config.db anymore, could it open config.db read only?

Remember that now, we have usually 7 queues, and each one has its own
runner process.  One of the advantages of this is that we can really
isolate lock acquisition to a finer granularity.  In fact,
OutgoingRunner -- which processes qfiles/out files, and thus is the
process that actually calls SMTPDirect -- does not lock the lists for
the normal delivery processing.  It simply shovels messages from the
queue to smtpd and doesn't need to update any list information, as
that's all done before the message gets to the outgoing queue.

There's one exception (of course ;).  If your smtpd ever returns
synchronous errors, then Mailman has to lock the list in order to
register bounces.  However, Mailman only does this periodically, and
this is controllable by the variable DEAL_WITH_PERMFAILURES_EVERY in
Mailman/Queue/OutgoingRunner.py (it's not a mm_cfg.py variable).

By default this is set to 1, but you could crank it up so that the
culling of the known bounces is done less frequently.  OTOH, if your
MTA is set up to never do recipient tests/deliveries synchonously (and
you're not delivering to local users), you should never have such
delivery failures to deal with, thus you'd never need to lock the