[Bug 1082308] [NEW] The qrunner-master lock file causes issues when running clustered

David Westlund davidw at axis.com
Fri Nov 23 11:28:09 CET 2012


Public bug reported:

Hi

It is possible to run mailman in a failover or load balancing cluster, see:
http://wiki.list.org/pages/viewpage.action?pageId=4030621

When running a cluster, it is crucial to use:
* a shared directory for archive data
* a shared directory for locks
* separate directories for each qrunner

This is possible to implement by setting the directories in mm_cfg.py, for example like this (where <host> is a host name):
VAR_PREFIX	= '<shared dir>'
LIST_DATA_DIR   = os.path.join(VAR_PREFIX, 'lists')
LOCK_DIR        = os.path.join(VAR_PREFIX, 'locks')
DATA_DIR        = os.path.join(VAR_PREFIX, 'data')
SPAM_DIR        = os.path.join(VAR_PREFIX, 'spam')
LOG_DIR         = os.path.join(VAR_PREFIX, 'logs-<host>')
PUBLIC_ARCHIVE_FILE_DIR  = os.path.join(VAR_PREFIX, 'archives', 'public')
PRIVATE_ARCHIVE_FILE_DIR = os.path.join(VAR_PREFIX, 'archives', 'private')
# For qfiles and logs, <dir>-<host> is used to avoid conflicts
QUEUE_DIR       = os.path.join(VAR_PREFIX, 'qfiles-<host>')
INQUEUE_DIR     = os.path.join(QUEUE_DIR, 'in')
OUTQUEUE_DIR    = os.path.join(QUEUE_DIR, 'out')
CMDQUEUE_DIR    = os.path.join(QUEUE_DIR, 'commands')
BOUNCEQUEUE_DIR = os.path.join(QUEUE_DIR, 'bounces')
NEWSQUEUE_DIR   = os.path.join(QUEUE_DIR, 'news')
ARCHQUEUE_DIR   = os.path.join(QUEUE_DIR, 'archive')
SHUNTQUEUE_DIR  = os.path.join(QUEUE_DIR, 'shunt')
VIRGINQUEUE_DIR = os.path.join(QUEUE_DIR, 'virgin')
BADQUEUE_DIR    = os.path.join(QUEUE_DIR, 'bad')
RETRYQUEUE_DIR  = os.path.join(QUEUE_DIR, 'retry')
MAILDIR_DIR     = os.path.join(QUEUE_DIR, 'maildir')

Unfortunately, the master-qrunner lock causes problem with this setup.
mailmanctl -s starts even if there is a master-qrunner file (provided
that there is no running mailmanctl on the host), making it possible to
get the service up and running on more than one host. Once a day
however, mailmanctl controls the lock. If it does not have it, it shuts
down. If you are running a cluster, at least one of the nodes will not
have the lock, and the service will be shut down on that node.

To solve this, I propose that the the LOCKFILE name in mailmanctl becomes configurable, so instead of having:
LOCKFILE = os.path.join(mm_cfg.LOCK_DIR, 'master-qrunner')
Have:
LOCKFILE = os.path.join(mm_cfg.LOCK_DIR, mm_cfg.QRUNNER_LOCK_FILE)

Then add LOCKFILE = 'master-qrunner' in Defaults.py.

This would make it easy to have individual qrunner master lock files for
each node in a cluster.

** Affects: mailman
     Importance: Undecided
         Status: New


** Tags: cluster

-- 
You received this bug notification because you are a member of Mailman
Coders, which is subscribed to GNU Mailman.
https://bugs.launchpad.net/bugs/1082308

Title:
  The qrunner-master lock file causes issues when running clustered

To manage notifications about this bug go to:
https://bugs.launchpad.net/mailman/+bug/1082308/+subscriptions


More information about the Mailman-coders mailing list