[Mailman-Developers] Re: Big problems with stale lockfiles on large list...

Graham TerMarsch mailman@howlingfrog.com
Wed, 2 May 2001 11:30:59 -0700

On Tuesday 01 May 2001 13:40, you wrote:
> Here's what I did: I loaded up a list with 60000 subscribers, then
> went to the members page.  It did indeed take a long time, and if I
> let it run to completion, I get the page as expected and no locks.
> However, if I hit the stop button before the page is finished loading,
> I can see that the CGI process continues to run for a while and then
> it may or may not clear the locks.  The page is not complete.  Since
> sometimes the locks are cleared and sometimes they're left, it's
> pretty clear there are race conditions involved.
> This seems to work, in that the locks appear to be cleared in the
> cases where they were left laying around before.  But because of all
> the race conditions, I can't be 100% sure.
> If you've read this far, the implication is that if the user hits the
> stop button, Mailman will in essence abort any changes to list
> configuration that this invocation may have made.  Alternatively, we
> could try to save & unlock in the signal handler, but that raises the
> possibility of race conditions again.  Also, it makes sense to move
> the save of the list data into the try: part of the clause and only do
> the unlocking in the finally.  That way, the finally clause and the
> SIGTERM handler have the same semantics, and the list will get
> unlocked in the face of either an exception or a signal.  But the list
> database will only get saved on sucessful completion of the task.  I
> can live with those semantics (I think ;).

Barry, wanted to thank you muchly for the lengthy description of the 
problem and the patch that you provided.  I figured that this was probably 
what was happening, after having gone through the process of running the 
CGIs repeatedly myself here.

From the initial testing that I've done, it appears that the patch that 
you provided does work (near as I can tell so far), and has helped 
eliminate the dangling/stale lockfile problems that we've been having.

As for the semantics of "save the list only if everything was successful", 
I too believe that those are livable (and likely proper) semantics to live 

Will let you know again if we continue to have this problem, but from what 
I've seen so far this appears to have fixed the major fire that I've had.  
Now all I've got to figure out is how to try to speed up the admin CGIs so 
that they don't take two or three minutes to load when dealing with large 

Thanks again Barry,

Graham TerMarsch