Over the weekend one of the uesers on one of my lists had both their DNS servers go down. This means that any attempt to deliver mail to them had a 4 minute DNS failure timeout.
This went fine, till about 5 messages for them accumulated in the qfiles directory. Then each qrunner that started up would try those 5 messages first, fail, and exceed its 15 minutes of runtime. This cycle would repeat forever.
It seems to me that qrunner should at least try messages in a most recent first order, instead of oldest first. But that might still lead to starvation of some messages in the queue.
A better approach would be for the qrunner to leave a file around that said where in the queue it had gotten to, so the next qrunner process could start on the next message in the queue. This should lead to the whole queue being serviced before any elements are serviced a second time. Does this make sense?
I'm too new to mailman and python to take a stab at this myself, but I wanted to report the bug. If others agree that its a bug, and no one else takes a shot at it, I might. But it will require some non-trivial modifications to qrunner I believe.
-=- Matthew L. Seidl email: seidl@cs.colorado.edu =-= =-= Graduate Student Project . . . What Project? -=- -=- http://www.cs.colorado.edu/~seidl/Home.html -Morrow Quotes =-= =-= http://www.cs.colorado.edu/~seidl/lawsuit -=-