[ mailman-Patches-1008983 ] qrunner w/ multiple slices crashing.

Patches item #1008983, was opened at 2004-08-13 21:45 Message generated for change (Settings changed) made by tkikuchi You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=300103&aid=1008983&group_id=103 Category: mail delivery Group: Mailman 2.1 Status: Open Resolution: None Priority: 7 Submitted By: Brian Greenberg (grnbrg)
Assigned to: Tokio Kikuchi (tkikuchi) Summary: qrunner w/ multiple slices crashing.
Initial Comment: When running qrunner with multiple instances of a particular class (ie: qrunner -r OutgoingRunner:0:4 -r OutgoingRunner:1:4 -r OutgoingRunner:2:4 -r OutgoingRunner:3:4 ) the qrunner processes for this class will periodically crash, leaving the following traces: logs/qrunner: Aug 13 15:27:51 2004 (29188) Master qrunner detected subprocess exit (pid: 23829, sig: None, sts: 1, class: OutgoingRunner, slice: 1/4) [restarting] logs/error: Aug 13 15:27:51 2004 qrunner(23829): Traceback (most recent call last): Aug 13 15:27:51 2004 qrunner(23829): File "/usr/local/mailman/bin/qrunner", line 270, in ? Aug 13 15:27:51 2004 qrunner(23829): main() Aug 13 15:27:51 2004 qrunner(23829): File "/usr/local/mailman/bin/qrunner", line 230, in main Aug 13 15:27:51 2004 qrunner(23829): qrunner.run() Aug 13 15:27:51 2004 qrunner(23829): File "/usr/local/mailman/Mailman/Queue/Runner.py", line 70, in run Aug 13 15:27:51 2004 qrunner(23829): filecnt = self._oneloop() Aug 13 15:27:51 2004 qrunner(23829): File "/usr/local/mailman/Mailman/Queue/Runner.py", line 99, in _oneloop Aug 13 15:27:51 2004 qrunner(23829): msg, msgdata = self._switchboard.dequeue(filebase) Aug 13 15:27:51 2004 qrunner(23829): File "/usr/local/mailman/Mailman/Queue/Switchboard.py", line 143, in dequeue Aug 13 15:27:51 2004 qrunner(23829): fp = open(filename) Aug 13 15:27:51 2004 qrunner(23829): IOError : [Errno 2] No such file or directory: '/var/priv/mail/mailman/qfiles/out/1092428866.8410051+70dcb0bb96e6460d8cd2a a8103cce318cfa3ed1f.pck' This is caused by a logic error in mailman/Mailman/Queues/Switchboard.py:files. Specifically, when there are not multiple slices running for a particular qrunner class, self.__upper and self.__lower are both set to None in Switchboard.py:__init__. Switchboard.py:files contains the statement: if not lower or (lower <= long(digest, 16) < upper): times[float](when)] = filebase ie: if there is only one slice (because "lower" is not "None") or if this filename is within the range of the slice that this qrunner is managing, then add it to the list. The problem is that the first slice of any multi-slice qrunner has a lower bound of "0". This means that slice "0" of any multi-slice qrunner will act on *all* files in a given queue, which in turn results in race conditions wherein slice 0 and slice n will begin to process a message, one will complete processing and remove the file, and the other will crash. Patch: *** Switchboard.py Fri Aug 13 16:43:12 2004 --- Switchboard.py_new Fri Aug 13 16:43:48 2004 *************** *** 164,170 **** when, digest = filebase.split('+') # Throw out any files which don't match our bitrange. BAW: test # performance and end-cases of this algorithm. ! if not lower or (lower <= long(digest, 16) < upper): times[float(when)] = filebase # FIFO sort keys = times.keys() --- 164,170 ---- when, digest = filebase.split('+') # Throw out any files which don't match our bitrange. BAW: test # performance and end-cases of this algorithm. ! if (lower == upper) or (lower <= long(digest, 16) < upper): times[float(when)] = filebase # FIFO sort keys = times.keys() Brian Greenberg -- grnbrg@cc.umanitoba.ca ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=300103&aid=1008983&group_id=103
participants (1)
-
SourceForge.net