[Mailman-Developers] load balancing with mailman.

Marc MERLIN marc_news@vasoftware.com
Sat, 26 Jan 2002 21:38:31 -0800


[Barry, question for you further down]

On Wed, Jan 23, 2002 at 03:22:43PM -0800, Darrell Fuhriman wrote:
> Right now, it's essentially impossible to have more than one
> machine doing Mailman processing.  (Yes, there are ways to hack
> around it, but they get ugly quickly.)
 
I have a mail server which automatically rewrites mailman lists envelope tos
to go to the mailman only mail server.
 
> It's also impossible to have the web interface on any machine
> other than the machine where list explosion happens.  This is

Yes and no.
Back in the mm2.0 days, one of my requirements for upgrading mailman on
sourceforge.net was to have redundancy and load balancing.

You _can_ export ~mailman over NFS. The problem was that with linux 2.2 back
then, under very high load and lock  contention (I sent 1000 messages to the
same list  on the  two different mail  servers to force  them to  fight over
~mailman/lists/listname/config.db).
I was  able to find  a race condition in  NFS rename/unlinks which  caused 3
messages out of the 2000 to bounce.
(I don't know if it's under NFS only, but after getting config.db corruption
on 3 messages,  when mailman renames the ocnfig.db.last  to config.db, there
was a very small  time window when there was no config.db,  and my exim with
auto list detection  failed to stat the config.db, and  stated that the list
didn't exist).

For more details, see my this message in the archives:
From: Marc MERLIN <marc_news@valinux.com>
To: mailman-developers@python.org
Subject: Doing load balancing with mailman
Message-ID: <20001117130521.V9808@marc.merlins.org>
Date: Fri, 17 Nov 2000 13:05:21 -0800  

and the following thread:

From: Marc MERLIN <marc_news@valinux.com>
To: mailman-developers@python.org
Subject: about qrunner and locking
Message-ID: <20001207162234.D25463@marc.merlins.org>
Date: Thu, 7 Dec 2000 16:22:34 -0800

Barry:
With the  new qrunner infrastructure,  does qrunner  still need to  lock the
lists during delivery?
If qrunner  doesn't modify config.db  anymore, could it open  config.db read
only?

The reason I  ask is that, while the current  sourceforge.net list server is
still doing ok with 16,000+ lists, 600,000 Emails a day or so, and full SMTP
callbacks  on each  incoming message,  I'm  still getting  pressure to  load
balance the machine, especially for the high availability part :-)

If I can have qrunner not lock ocnfig.db, the only lock contention I'll have
is when  lists are modified  through the web, and  even if I  share ~mailman
over NFS,  I'm confident that we  won't hit some NFS  race condition because
the same list  is being modified by  two different admins at  the same exact
nanosecond.

> So, I'll probably be able to devote time over the next couple
> months to writing this, but I'm interested in how people feel
> such a beast should look, especially (4).

I think the NFS approach is the simplest by far :-)
It even  works today  if you don't  deliver thousands of  messages in  a few
seconds to the same list :-)
(actually with linux 2.4 or some other  OS, and mailman 2.1, the bug may not
be triggered anymore)

Marc
-- 
Microsoft is to operating systems & security ....
                                      .... what McDonalds is to gourmet cooking
  
Home page: http://marc.merlins.org/   |   Finger marc_f@merlins.org for PGP key