[Bug 1077908] [NEW] MailList configuration race condition between qrunners
Public bug reported: Hi, If two qrunner processes access list configuration file in the same second, second qrunner proces does not reload the configuration file, because it thinks that it has not changed. The decision is made according to config file's modification time, which is always in seconds (not in miliseconds) on older systems. This breaks mlist.post_id incrementation if the first process is IncomingRunner and second process ArchRunner and leads to situation where two emails have the same sequence number. The flow is following (everything happens in the same second): AR (ArchRunner) loads config file with post_id=1 AR saves config file with post_id=1 IR (IncomingRunner) loads config file with post_id=1 IR increments post_id and saves config file with post_id=2 AR loads config file, but mlist.__timestamp == mtime, so it thinks it has the latest config file AR uses post_id=1 (instead of post_id=2) AR saves config file with post_id=1 (the bug happens - post_id=2 is replaced with post_id=1 in config file) Second problem is that during the GENERAL_PIPELINE, in some cases the lock file is unlocked and next mlist.Save() fails. The patch checks for this case and try to Lock() the mailing list again. ** Affects: mailman Importance: Undecided Status: New -- You received this bug notification because you are a member of Mailman Coders, which is subscribed to GNU Mailman. https://bugs.launchpad.net/bugs/1077908 Title: MailList configuration race condition between qrunners To manage notifications about this bug go to: https://bugs.launchpad.net/mailman/+bug/1077908/+subscriptions
** Patch added: "mailman-lock-race.patch" https://bugs.launchpad.net/mailman/+bug/1077908/+attachment/3431197/+files/m... -- You received this bug notification because you are a member of Mailman Coders, which is subscribed to GNU Mailman. https://bugs.launchpad.net/bugs/1077908 Title: MailList configuration race condition between qrunners To manage notifications about this bug go to: https://bugs.launchpad.net/mailman/+bug/1077908/+subscriptions
Hm, when thinking about it, my patch wasn't the best thing for the race- condition. It will load config file everytime now, I think. But I think you got the idea what's wrong there. -- You received this bug notification because you are a member of Mailman Coders, which is subscribed to GNU Mailman. https://bugs.launchpad.net/bugs/1077908 Title: MailList configuration race condition between qrunners To manage notifications about this bug go to: https://bugs.launchpad.net/mailman/+bug/1077908/+subscriptions
*** This bug is a duplicate of bug 266464 *** https://bugs.launchpad.net/bugs/266464 What mailman version are you running? I think this duplicates LP:266464 which was fixed in Mailman 2.1.15. Please confirm. ** Changed in: mailman Importance: Undecided => Medium ** Changed in: mailman Status: New => Fix Committed ** Changed in: mailman Milestone: None => 2.1.15 ** Changed in: mailman Assignee: (unassigned) => Mark Sapiro (msapiro) ** This bug has been marked a duplicate of bug 266464 Subscriber "disappears" after subscription -- You received this bug notification because you are a member of Mailman Coders, which is subscribed to GNU Mailman. https://bugs.launchpad.net/bugs/1077908 Title: MailList configuration race condition between qrunners To manage notifications about this bug go to: https://bugs.launchpad.net/mailman/+bug/1077908/+subscriptions
*** This bug is a duplicate of bug 266464 *** https://bugs.launchpad.net/bugs/266464 Regarding your second problem of the list being unlocked by a handler during processing of the pipeline (I assume you mean GLOBAL_PIPELINE, not GENERAL_PIPELINE), I have never seen a report of this. Can you say which handler or under what circumstance this happens? -- You received this bug notification because you are a member of Mailman Coders, which is subscribed to GNU Mailman. https://bugs.launchpad.net/bugs/1077908 Title: MailList configuration race condition between qrunners To manage notifications about this bug go to: https://bugs.launchpad.net/mailman/+bug/1077908/+subscriptions
*** This bug is a duplicate of bug 266464 *** https://bugs.launchpad.net/bugs/266464 Also see the post at <http://mail.python.org/pipermail/mailman- developers/2011-September/021524.html> and other posts linked therefrom. -- You received this bug notification because you are a member of Mailman Coders, which is subscribed to GNU Mailman. https://bugs.launchpad.net/bugs/1077908 Title: MailList configuration race condition between qrunners To manage notifications about this bug go to: https://bugs.launchpad.net/mailman/+bug/1077908/+subscriptions
*** This bug is a duplicate of bug 266464 ***
https://bugs.launchpad.net/bugs/266464
Sorry, this is mailman-2.1.9 and mailman-2.1.12. You are right it's
fixed in 2.1.15, thanks.
This is my debugging log showing the second problem. I reproduced it
only once and my test case was sending lot of emails per second (I
basically sent 10000 emails using more threads).
It prints "SAVE post_id" or "LOAD post_id" in the end of mlist.Save()
and mlist.Load() methods followed by traceback (to find out what called
those methods - it was useful to debug the first problem). You can see
the lock problem in the end.
Oct 15 06:47:27 2012 (20378) SAVE 9345
[' File "/usr/lib/mailman/bin/qrunner", line 278, in ?\n main()\n', ' File "/usr/lib/mailman/bin/qrunner", line 238, in main\n qrunner.run()\n', ' File "/usr/lib/mailman/Mailman/Queue/Runner.py", line 71, in run\n filecnt = s
elf._oneloop()\n', ' File "/usr/lib/mailman/Mailman/Queue/Runner.py", line 113, in _oneloop\n self._onefile(msg, msgdata)\n', ' File "/usr/lib/mailman/Mailman/Queue/Runner.py", line 176, in _onefile\n keepqueued = self._dispose(m
list, msg, msgdata)\n', ' File "/usr/lib/mailman/Mailman/Queue/IncomingRunner.py", line 133, in _dispose\n mlist.Save()\n', ' File "/usr/lib/mailman/Mailman/MailList.py", line 561, in Save\n syslog(\'error\', \'SAVE %d\\n%s\\n\',
self.post_id, repr(traceback.format_stack()))\n']
Oct 15 06:47:28 2012 (20378) SAVE 9346
[' File "/usr/lib/mailman/bin/qrunner", line 278, in ?\n main()\n', ' File "/usr/lib/mailman/bin/qrunner", line 238, in main\n qrunner.run()\n', ' File "/usr/lib/mailman/Mailman/Queue/Runner.py", line 71, in run\n filecnt = s
elf._oneloop()\n', ' File "/usr/lib/mailman/Mailman/Queue/Runner.py", line 113, in _oneloop\n self._onefile(msg, msgdata)\n', ' File "/usr/lib/mailman/Mailman/Queue/Runner.py", line 176, in _onefile\n keepqueued = self._dispose(m
list, msg, msgdata)\n', ' File "/usr/lib/mailman/Mailman/Queue/IncomingRunner.py", line 133, in _dispose\n mlist.Save()\n', ' File "/usr/lib/mailman/Mailman/MailList.py", line 561, in Save\n syslog(\'error\', \'SAVE %d\\n%s\\n\',
self.post_id, repr(traceback.format_stack()))\n']
Oct 15 06:47:28 2012 (20380) LOAD 9346
[' File "/usr/lib/mailman/bin/qrunner", line 278, in ?\n main()\n', ' File "/usr/lib/mailman/bin/qrunner", line 238, in main\n qrunner.run()\n', ' File "/usr/lib/mailman/Mailman/Queue/Runner.py", line 71, in run\n filecnt = s
elf._oneloop()\n', ' File "/usr/lib/mailman/Mailman/Queue/Runner.py", line 113, in _oneloop\n self._onefile(msg, msgdata)\n', ' File "/usr/lib/mailman/Mailman/Queue/Runner.py", line 176, in _onefile\n keepqueued = self._dispose(m
list, msg, msgdata)\n', ' File "/usr/lib/mailman/Mailman/Queue/OutgoingRunner.py", line 70, in _dispose\n mlist.Load()\n', ' File "/usr/lib/mailman/Mailman/MailList.py", line 666, in Load\n syslog(\'error\', \'LOAD %d\\n%s\\n\',
self.post_id, repr(traceback.format_stack()))\n']
Oct 15 06:47:28 2012 (20378) Uncaught runner exception:
** This bug is no longer a duplicate of bug 266464 Subscriber "disappears" after subscription ** Changed in: mailman Status: Fix Committed => Triaged ** Changed in: mailman Status: Triaged => Incomplete -- You received this bug notification because you are a member of Mailman Coders, which is subscribed to GNU Mailman. https://bugs.launchpad.net/bugs/1077908 Title: MailList configuration race condition between qrunners To manage notifications about this bug go to: https://bugs.launchpad.net/mailman/+bug/1077908/+subscriptions
Reopening this bug for the second issue only. The first issue is a duplicate of LP:266464. The second issue requires further study. ** Changed in: mailman Milestone: 2.1.15 => None -- You received this bug notification because you are a member of Mailman Coders, which is subscribed to GNU Mailman. https://bugs.launchpad.net/bugs/1077908 Title: MailList configuration race condition between qrunners To manage notifications about this bug go to: https://bugs.launchpad.net/mailman/+bug/1077908/+subscriptions
It seems that what happened is the NotLockedError and traceback at the end of comment #6 occurred when IncomingRunner attempted to Save() the list (two log messages earlier in comment #6) and found it not locked, or possibly this was the Save() after the next message which wasn't yet logged. In either case, this may have only occurred because of LP:266464. Pending further information and evidence that this can occur with the fix for LP:266464 installed, I'm leaving this 'incomplete'. -- You received this bug notification because you are a member of Mailman Coders, which is subscribed to GNU Mailman. https://bugs.launchpad.net/bugs/1077908 Title: MailList configuration race condition between qrunners To manage notifications about this bug go to: https://bugs.launchpad.net/mailman/+bug/1077908/+subscriptions
When I reproduced this second bug, my fix for the first bug had not been applied yet. I was not able to reproduce the second bug with your patch for LP:266464. I think you can close this bug. Thank you for all the information and help. -- You received this bug notification because you are a member of Mailman Coders, which is subscribed to GNU Mailman. https://bugs.launchpad.net/bugs/1077908 Title: MailList configuration race condition between qrunners To manage notifications about this bug go to: https://bugs.launchpad.net/mailman/+bug/1077908/+subscriptions
*** This bug is a duplicate of bug 266464 *** https://bugs.launchpad.net/bugs/266464 ** This bug has been marked a duplicate of bug 266464 Subscriber "disappears" after subscription -- You received this bug notification because you are a member of Mailman Coders, which is subscribed to GNU Mailman. https://bugs.launchpad.net/bugs/1077908 Title: MailList configuration race condition between qrunners To manage notifications about this bug go to: https://bugs.launchpad.net/mailman/+bug/1077908/+subscriptions
participants (2)
-
Jan Kaluza
-
Mark Sapiro