[ mailman-Patches-1008983 ] qrunner w/ multiple slices crashing.

SourceForge.net noreply at sourceforge.net
Fri Oct 22 09:49:14 CEST 2004


Patches item #1008983, was opened at 2004-08-13 21:45
Message generated for change (Comment added) made by tkikuchi
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=300103&aid=1008983&group_id=103

Category: mail delivery
Group: Mailman 2.1
Status: Open
Resolution: None
Priority: 7
Submitted By: Brian Greenberg (grnbrg)
Assigned to: Tokio Kikuchi (tkikuchi)
Summary: qrunner w/ multiple slices crashing.

Initial Comment:
When running qrunner with multiple instances of a
particular class (ie:  qrunner  -r OutgoingRunner:0:4
-r OutgoingRunner:1:4 -r OutgoingRunner:2:4 -r
OutgoingRunner:3:4 ) the qrunner processes for this
class will periodically crash, leaving the following
traces:

logs/qrunner:

Aug 13 15:27:51 2004 (29188) Master qrunner detected
subprocess exit (pid: 23829, sig: None, sts: 1, class:
OutgoingRunner, slice: 1/4) [restarting]

logs/error:

Aug 13 15:27:51 2004 qrunner(23829): Traceback (most
recent call last):
Aug 13 15:27:51 2004 qrunner(23829):   File
"/usr/local/mailman/bin/qrunner", line 270, in ?
Aug 13 15:27:51 2004 qrunner(23829):      main()
Aug 13 15:27:51 2004 qrunner(23829):   File
"/usr/local/mailman/bin/qrunner", line 230, in main
Aug 13 15:27:51 2004 qrunner(23829):      qrunner.run()
Aug 13 15:27:51 2004 qrunner(23829):   File
"/usr/local/mailman/Mailman/Queue/Runner.py", line 70,
in run
Aug 13 15:27:51 2004 qrunner(23829):      filecnt =
self._oneloop()
Aug 13 15:27:51 2004 qrunner(23829):   File
"/usr/local/mailman/Mailman/Queue/Runner.py", line 99,
in _oneloop
Aug 13 15:27:51 2004 qrunner(23829):      msg, msgdata
= self._switchboard.dequeue(filebase)
Aug 13 15:27:51 2004 qrunner(23829):   File
"/usr/local/mailman/Mailman/Queue/Switchboard.py", line
143, in dequeue
Aug 13 15:27:51 2004 qrunner(23829):      fp =
open(filename)
Aug 13 15:27:51 2004 qrunner(23829): IOError :  [Errno
2] No such file or directory:
'/var/priv/mail/mailman/qfiles/out/1092428866.8410051+70dcb0bb96e6460d8cd2a
a8103cce318cfa3ed1f.pck' 

This is caused by a logic error in
mailman/Mailman/Queues/Switchboard.py:files. 
Specifically, when there are not multiple slices
running for a particular qrunner class, self.__upper
and self.__lower are both set to None in
Switchboard.py:__init__.  Switchboard.py:files contains
the statement:

if not lower or (lower <= long(digest, 16) < upper):
   times[float](when)] = filebase

ie:  if there is only one slice (because "lower" is not
"None") or if this filename is within the range of the
slice that this qrunner is managing, then add it to the
list.

The problem is that the first slice of any multi-slice
qrunner has a lower bound of "0".  This means that
slice "0" of any multi-slice qrunner will act on *all*
files in a given queue, which   in turn results in race
conditions wherein slice 0 and slice n will begin to
process a message, one will complete processing and
remove the file, and the other will crash.

Patch:  

*** Switchboard.py      Fri Aug 13 16:43:12 2004
--- Switchboard.py_new  Fri Aug 13 16:43:48 2004
***************
*** 164,170 ****
              when, digest = filebase.split('+')
              # Throw out any files which don't match
our bitrange.  BAW: test
              # performance and end-cases of this
algorithm.
!             if not lower or (lower <= long(digest,
16) < upper):
                  times[float(when)] = filebase
          # FIFO sort
          keys = times.keys()
--- 164,170 ----
              when, digest = filebase.split('+')
              # Throw out any files which don't match
our bitrange.  BAW: test
              # performance and end-cases of this
algorithm.
!             if (lower == upper) or (lower <=
long(digest, 16) < upper):
                  times[float(when)] = filebase
          # FIFO sort
          keys = times.keys()


Brian Greenberg
--
grnbrg at cc.umanitoba.ca


----------------------------------------------------------------------

>Comment By: Tokio Kikuchi (tkikuchi)
Date: 2004-10-22 07:49

Message:
Logged In: YES 
user_id=67709

I am testing one of my working server (moderate size ~3000
list) with Barry's patch. Also try setting (OutgoingRunner
2). So, if nothing happens in a day or two, this will go
into CVS.


----------------------------------------------------------------------

Comment By: Barry A. Warsaw (bwarsaw)
Date: 2004-10-22 01:52

Message:
Logged In: YES 
user_id=12800

Better: the "if not lower" should probably be changed to "if
lower is None".


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=300103&aid=1008983&group_id=103


More information about the Mailman-coders mailing list