
Spyro Polymiadis wrote:
Just posting what I believe was the solution.. I saw that there was a bunch of lock files in the lock directory matching the list name.. and a bunch of files in the queue all destined to that list.. and some were marcked as .bak instead of .pck So I deleted the .bak files, and the lock files, restarted mailman.. and looks like everything is flowing again...
Any ideas on what may have caused this so I can look out for it and maybe prevent it for the future?
It is unclear why there was more that one .bak unless you are running multiple slices for IncomingRunner. One .bak file would be the 'backup' entry for the message currently being processed by the IncomingRunner waiting for the lock.
It appears that something, possibly a web CGI, died and left the list locked. The lock files themselves had a lot of information, but they are gone. See <http://www.python.org/cgi-bin/faqw-mm.py?req=show&file=faq04.076.htp>.
At this point, there may be information in Mailman's 'error', 'locks' and/or 'qrunner' logs that would help.
If it in fact was the case that it was one stale lock causing the problem, it would have been sufficient to simply remove the one <listname>.lock.<hostname>.<pid>.<seq> file whose <pid> was gone and whose name was the contents of the <listname>.lock file, and then also remove the <listname>.lock file itself.
Note also, that by deleting the .bak files, you probably lost the messages they contained. Had you left them, they would have been 'recovered' when you restarted Mailman.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan