Hi Roger,
First let me apologize on behalf of the project for the delay in distribution of your post. It appears a few posts got trapped in limbo for about 10 days, and the delay was definitely between Mailman's MTA and mine.
Roger writes:
I inherited a mult-server install of Mailman.
During a recent upgrade of the servers, I'm realizing the master-qrunner.pid file is in the 'data' directory which is shared between the two servers.
What version is Mailman? I believe that recent versions put the PID file in the 'lock' directory.
On my Debian installation, that directory is /var/lib/mailman/lock, which is actually a symlink to /var/lock/mailman. I suspect this setup is intended to resolve exactly this kind of issue.
I have the lock file set to use the hostname in the name of the lock file.
Shouldn't the PID file be in a local directory?
Yes. You don't want the operation of local processes to be subject to network issues.
the old servers had a cron entry to re-start mailman every night. I'm wondering if there was flakiness with a shared PID file going on that was 'fixed' by rebooting mailman nightly.
If the name of the host is in the name of the lock file, this should not cause "flakiness" problems because a conflict between the systems. However, in my Debian install, I have both master-qrunner and master-qrunner.<host>.<pid>, hardlinked to the same file which contains the full pathname (in /var/lib/mailman/lock).
I think it's more likely that either the whole system including the network was flaky, or that Mailman isn't designed to be robust in a multi-host configuration. It's generally designed to be robust against various failures, so it's probably OK, but "multihost operation with shared filesystem" was not an explicit design criterion.
Steve