[Mailman-Developers] Mailman on a cluster

Thu Jun 8 13:30:03 CEST 2006

--On 7 June 2006 15:26:48 -0400 Barry Warsaw <barry at python.org> wrote:

>
>> I'm looking at running Mailman on a cluster of servers, sharing a
>> single disk with Apple XSan.
>>
>> I presume that everything done through the web interface must be
>> resilient to multiple simultaneous attempts to modify - say - a list
>> membership. So, I expect that I can put the list databases on a
>> shared file system.
>
> Mailman's locking scheme is NFS-safe, and should be safe on all shared
> files systems that support POSIX semantics.  Given this, there should
> be no race conditions against the list databases, through any of the
> access mechanisms. Just make sure that if you have really huge lists,
> your cgi timeouts are set appropriately or your web server could kill
> the Mailman cgis while they wait for some other process's list lock to
> be released (this is the case even if you weren't clustered).  Make
> sure your list lock timeouts are set appropriately too because you
> really don't want them getting broken.

That's good. It's an essential requirement that all the web servers see the 
same information.

>> But, what about the queue runners. Is it safe to share the queues
>> between servers? If its safe to run multiple queue runners on one
>> queue on one machine, then I should be OK - but I'm not sure that
>> Mailman does that. If not, then I'm probably safer leaving the queues
>> on unshared file systems, and accepting that a some queued items
>> won't get processed while a cluster member is unavailable.
>
> This should be safe too.  Mailman uses a sha1 hash space slicing
> algorithm to ensure that each qfile is managed by exactly one qrunner
> process.  No locking is required as long as you configure your slices
> appropriately.  It's still the case though that should a cluster member
> go down, that portion of the hash space managed by those qrunners won't
> get processed.  It should be fairly easy to reconfigure a backup though
> to handle the hash space while the primary is off-line.

OK, so I have to figure out how to configure the slices to avoid clashes, 
but still won't gain anything until I figure out a backup scheme. I have 
failover between cluster members, so that should be doable. On the other 
hand, I might just live with the downtime - cluster members don't tend to 
go off line for more than a few minutes.

Thanks Barry. I feel like I know what I'm doing now!

> - -Barry

-- 
Ian Eiloart
IT Services, University of Sussex