Follow up to a Lock problem.

Hi All,
I've summarized my issue below, and I would like to say I have around 215K subscribers on a Dedicated Server With a Quad Xeon CPU and 4GB's Ram. Most notably Mailman chews up around 2GB's+ of Ram but completely max's my CPU usage in top on CentOS. Python runs about 98% to 100% - I suspect this is from a mail drop going on however it doesn't explain below. Please read on.
Mailman Version: 2.1.9
Symptoms: I had an issue a month or so ago where my web interface on " http://mysite.com/mailman/admin/mylist" would lock up on login, and I couldnt access that list. The other lists I could access no problems. I was guided to the FAQ at <http://wiki.list.org/x/noA9> by Mark S. and that temporarily relieved the problem. I find now that I have to completely delete my locks everytime I want to go to a new location in the admin interface. I have included some errors from the locks log below. Can you assist me?
Errors: [root@x mailman]# tail -50 /var/log/mailman/locks Oct 26 21:13:56 2010 (17763) File "/usr/lib/mailman/Mailman/Queue/IncomingRunner.py", line 115, in _dispose Oct 26 21:13:56 2010 (17763) mlist.Lock(timeout=mm_cfg.LIST_LOCK_TIMEOUT) Oct 26 21:13:56 2010 (17763) File "/usr/lib/mailman/Mailman/MailList.py", line 161, in Lock Oct 26 21:13:56 2010 (17763) self.__lock.lock(timeout) Oct 26 21:13:56 2010 (17763) File "/usr/lib/mailman/Mailman/LockFile.py", line 287, in lock Oct 26 21:13:56 2010 (17763) self.__linkcount(), important=True) Oct 26 21:13:56 2010 (17763) File "/usr/lib/mailman/Mailman/LockFile.py", line 416, in __writelog Oct 26 21:13:56 2010 (17763) traceback.print_stack(file=logf) Oct 26 21:13:58 2010 (17763) xlt.lock unexpected linkcount: 1 Oct 26 21:13:58 2010 (17763) File "/usr/lib/mailman/bin/qrunner", line 278, in ? Oct 26 21:13:58 2010 (17763) main() Oct 26 21:13:58 2010 (17763) File "/usr/lib/mailman/bin/qrunner", line 238, in main Oct 26 21:13:58 2010 (17763) qrunner.run() Oct 26 21:13:58 2010 (17763) File "/usr/lib/mailman/Mailman/Queue/Runner.py", line 71, in run Oct 26 21:13:58 2010 (17763) filecnt = self._oneloop() Oct 26 21:13:58 2010 (17763) File "/usr/lib/mailman/Mailman/Queue/Runner.py", line 112, in _oneloop Oct 26 21:13:58 2010 (17763) self._onefile(msg, msgdata) Oct 26 21:13:58 2010 (17763) File "/usr/lib/mailman/Mailman/Queue/Runner.py", line 170, in _onefile Oct 26 21:13:58 2010 (17763) keepqueued = self._dispose(mlist, msg, msgdata) Oct 26 21:13:58 2010 (17763) File "/usr/lib/mailman/Mailman/Queue/VirginRunner.py", line 38, in _dispose Oct 26 21:13:58 2010 (17763) return IncomingRunner._dispose(self, mlist, msg, msgdata) Oct 26 21:13:58 2010 (17763) File "/usr/lib/mailman/Mailman/Queue/IncomingRunner.py", line 115, in _dispose Oct 26 21:13:58 2010 (17763) mlist.Lock(timeout=mm_cfg.LIST_LOCK_TIMEOUT) Oct 26 21:13:58 2010 (17763) File "/usr/lib/mailman/Mailman/MailList.py", line 161, in Lock Oct 26 21:13:58 2010 (17763) self.__lock.lock(timeout) Oct 26 21:13:58 2010 (17763) File "/usr/lib/mailman/Mailman/LockFile.py", line 287, in lock Oct 26 21:13:58 2010 (17763) self.__linkcount(), important=True) Oct 26 21:13:58 2010 (17763) File "/usr/lib/mailman/Mailman/LockFile.py", line 416, in __writelog Oct 26 21:13:58 2010 (17763) traceback.print_stack(file=logf) Oct 26 21:13:58 2010 (17763) xlt.lock unexpected linkcount: 1 Oct 26 21:13:58 2010 (17763) File "/usr/lib/mailman/bin/qrunner", line 278, in ? Oct 26 21:13:58 2010 (17763) main() Oct 26 21:13:58 2010 (17763) File "/usr/lib/mailman/bin/qrunner", line 238, in main Oct 26 21:13:58 2010 (17763) qrunner.run() Oct 26 21:13:58 2010 (17763) File "/usr/lib/mailman/Mailman/Queue/Runner.py", line 71, in run Oct 26 21:13:58 2010 (17763) filecnt = self._oneloop() Oct 26 21:13:58 2010 (17763) File "/usr/lib/mailman/Mailman/Queue/Runner.py", line 112, in _oneloop Oct 26 21:13:58 2010 (17763) self._onefile(msg, msgdata) Oct 26 21:13:58 2010 (17763) File "/usr/lib/mailman/Mailman/Queue/Runner.py", line 170, in _onefile Oct 26 21:13:58 2010 (17763) keepqueued = self._dispose(mlist, msg, msgdata) Oct 26 21:13:58 2010 (17763) File "/usr/lib/mailman/Mailman/Queue/VirginRunner.py", line 38, in _dispose Oct 26 21:13:58 2010 (17763) return IncomingRunner._dispose(self, mlist, msg, msgdata) Oct 26 21:13:58 2010 (17763) File "/usr/lib/mailman/Mailman/Queue/IncomingRunner.py", line 115, in _dispose Oct 26 21:13:58 2010 (17763) mlist.Lock(timeout=mm_cfg.LIST_LOCK_TIMEOUT) Oct 26 21:13:58 2010 (17763) File "/usr/lib/mailman/Mailman/MailList.py", line 161, in Lock Oct 26 21:13:58 2010 (17763) self.__lock.lock(timeout) Oct 26 21:13:58 2010 (17763) File "/usr/lib/mailman/Mailman/LockFile.py", line 287, in lock Oct 26 21:13:58 2010 (17763) self.__linkcount(), important=True) Oct 26 21:13:58 2010 (17763) File "/usr/lib/mailman/Mailman/LockFile.py", line 416, in __writelog Oct 26 21:13:58 2010 (17763) traceback.print_stack(file=logf) [root@x mailman]#
-- Llewellyn G.S. Curran

Llewellyn Curran wrote:
When we attempt to obtain a lock, the process attempting the lock first writes a file named, e.g. <listname>.lock.<hostname>.<pid>.<counter> and then attempts to create a hard link to that file named <listname>.lock.
In the case above, the OS returned an EEXIST error to the attempted link meaning that the <listname>.lock file existed (xlt.lock in this case), but 'unexpected linkcount: 1' says that file is not linked to any <listname>.lock.<hostname>.<pid>.<counter> file or any other file.
When this happens, the contents of that xlt.lock file should give the hostname and pid of the process that obtained the lock. That may help.
Also see the FAQ at <http://wiki.list.org/x/_4A9> for information about ensuring that only one mailmanctl and one set of qrunners are running and make sure that's the case.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

Llewellyn Curran wrote:
When we attempt to obtain a lock, the process attempting the lock first writes a file named, e.g. <listname>.lock.<hostname>.<pid>.<counter> and then attempts to create a hard link to that file named <listname>.lock.
In the case above, the OS returned an EEXIST error to the attempted link meaning that the <listname>.lock file existed (xlt.lock in this case), but 'unexpected linkcount: 1' says that file is not linked to any <listname>.lock.<hostname>.<pid>.<counter> file or any other file.
When this happens, the contents of that xlt.lock file should give the hostname and pid of the process that obtained the lock. That may help.
Also see the FAQ at <http://wiki.list.org/x/_4A9> for information about ensuring that only one mailmanctl and one set of qrunners are running and make sure that's the case.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
participants (2)
-
Llewellyn Curran
-
Mark Sapiro