Hello,
We are running mailman 2.1.9. Recently we had a problem that mailman's Incoming qrunner died hard, which caused messages accumulated under qfiles/in directory. Restarting mailman didn't help. We finally identified a message on top of the queue, moved it aside, restarted mailman again, then mail will start flow.
Putting that message back to the in queue will trigger the problem. The sender actually tried to send it again, and it caused the same problem.
Before I got into the details of mailman logs, I'd like to ask:
What's the best way to detect "stalled" condition like this? We do monitor qrunner processes but in this case it didn't help because the qrunners were still running. Now I put in a script to monitor the number of files in "in" queue, and if it reaches a threshold, sends alert.
Why this message didn't get moved to shunt directory?
Can the mailman recover itself without human intervention?
Details are followed:
The following error message will repeat a couple of times in error log when this message was processed:
Feb 20 07:55:53 2007 qrunner(25722): Traceback (most recent call last): Feb 20 07:55:53 2007 qrunner(25722): File "/var/lib/mailman/bin/qrunner", line 278, in ? Feb 20 07:55:53 2007 qrunner(25722): main() Feb 20 07:55:53 2007 qrunner(25722): File "/var/lib/mailman/bin/qrunner", line 238, in main Feb 20 07:55:53 2007 qrunner(25722): qrunner.run() Feb 20 07:55:53 2007 qrunner(25722): File "/usr/lib/mailman/Mailman/Queue/Runner.py", line 71, in run Feb 20 07:55:53 2007 qrunner(25722): filecnt = self._oneloop() Feb 20 07:55:53 2007 qrunner(25722): File "/usr/lib/mailman/Mailman/Queue/Runner.py", line 100, in _oneloop Feb 20 07:55:53 2007 qrunner(25722): msg, msgdata = self._switchboard.dequeue(filebase) Feb 20 07:55:53 2007 qrunner(25722): File "/usr/lib/mailman/Mailman/Queue/Switchboard.py", line 164, in dequeue Feb 20 07:55:53 2007 qrunner(25722): msg = email.message_from_string(msg, Message.Message) Feb 20 07:55:53 2007 qrunner(25722): File "/usr/lib/python2.3/email/__init__.py", line 52, in message_from_string Feb 20 07:55:53 2007 qrunner(25722): return Parser(_class, strict=strict).parsestr(s) Feb 20 07:55:53 2007 qrunner(25722): File "/usr/lib/python2.3/email/Parser.py", line 75, in parsestr Feb 20 07:55:53 2007 qrunner(25722): return self.parse(StringIO(text), headersonly=headersonly) Feb 20 07:55:53 2007 qrunner(25722): File "/usr/lib/python2.3/email/Parser.py", line 64, in parse Feb 20 07:55:53 2007 qrunner(25722): self._parsebody(root, fp, firstbodyline) Feb 20 07:55:53 2007 qrunner(25722): File "/usr/lib/python2.3/email/Parser.py", line 240, in _parsebody Feb 20 07:55:53 2007 qrunner(25722): msgobj = self.parsestr(part) Feb 20 07:55:53 2007 qrunner(25722): File "/usr/lib/python2.3/email/Parser.py", line 75, in parsestr Feb 20 07:55:53 2007 qrunner(25722): return self.parse(StringIO(text), headersonly=headersonly) Feb 20 07:55:53 2007 qrunner(25722): File "/usr/lib/python2.3/email/Parser.py", line 64, in parse Feb 20 07:55:53 2007 qrunner(25722): self._parsebody(root, fp, firstbodyline) Feb 20 07:55:53 2007 qrunner(25722): File "/usr/lib/python2.3/email/Parser.py", line 146, in _parsebody Feb 20 07:55:53 2007 qrunner(25722): boundary = container.get_boundary() Feb 20 07:55:53 2007 qrunner(25722): File "/usr/lib/python2.3/email/Message.py", line 743, in get_boundary Feb 20 07:55:53 2007 qrunner(25722): boundary = self.get_param('boundary', missing) Feb 20 07:55:53 2007 qrunner(25722): File "/usr/lib/python2.3/email/Message.py", line 608, in get_param Feb 20 07:55:53 2007 qrunner(25722): for k, v in self._get_params_preserve(failobj, header): Feb 20 07:55:53 2007 qrunner(25722): File "/usr/lib/python2.3/email/Message.py", line 555, in _get_params_preserve Feb 20 07:55:53 2007 qrunner(25722): params = Utils.decode_params(params) Feb 20 07:55:53 2007 qrunner(25722): File "/usr/lib/python2.3/email/Utils.py", line 337, in decode_params Feb 20 07:55:53 2007 qrunner(25722): charset, language, value = decode_rfc2231(EMPTYSTRING.join(value)) Feb 20 07:55:53 2007 qrunner(25722): File "/usr/lib/python2.3/email/Utils.py", line 284, in decode_rfc2231 Feb 20 07:55:53 2007 qrunner(25722): charset, language, s = parts Feb 20 07:55:53 2007 qrunner(25722): ValueError : unpack list of wrong size
The qrunner log shows: Feb 20 07:55:52 2007 (27933) Master qrunner detected subprocess exit (pid: 27941, sig: None, sts: 1, class: IncomingRunner, slice: 1/4) [restarting] Feb 20 07:55:53 2007 (25722) IncomingRunner qrunner started. Feb 20 07:55:53 2007 (27933) Master qrunner detected subprocess exit (pid: 25722, sig: None, sts: 1, class: IncomingRunner, slice: 1/4) [restarting] Feb 20 07:55:53 2007 (25723) IncomingRunner qrunner started. Feb 20 07:55:53 2007 (27933) Master qrunner detected subprocess exit (pid: 25723, sig: None, sts: 1, class: IncomingRunner, slice: 1/4) [restarting] Feb 20 07:55:53 2007 (25724) IncomingRunner qrunner started. Feb 20 07:55:53 2007 (27933) Master qrunner detected subprocess exit (pid: 25724, sig: None, sts: 1, class: IncomingRunner, slice: 1/4) [restarting] Feb 20 07:55:53 2007 (25725) IncomingRunner qrunner started. Feb 20 07:55:53 2007 (27933) Master qrunner detected subprocess exit (pid: 25725, sig: None, sts: 1, class: IncomingRunner, slice: 1/4) [restarting] Feb 20 07:55:53 2007 (25726) IncomingRunner qrunner started. Feb 20 07:55:54 2007 (27933) Master qrunner detected subprocess exit (pid: 25726, sig: None, sts: 1, class: IncomingRunner, slice: 1/4) [restarting] Feb 20 07:55:54 2007 (25727) IncomingRunner qrunner started. Feb 20 07:55:54 2007 (27933) Master qrunner detected subprocess exit (pid: 25727, sig: None, sts: 1, class: IncomingRunner, slice: 1/4) [restarting] Feb 20 07:55:54 2007 (25728) IncomingRunner qrunner started. Feb 20 07:55:54 2007 (27933) Master qrunner detected subprocess exit (pid: 25728, sig: None, sts: 1, class: IncomingRunner, slice: 1/4) [restarting] Feb 20 07:55:54 2007 (25729) IncomingRunner qrunner started. Feb 20 07:55:54 2007 (27933) Master qrunner detected subprocess exit (pid: 25729, sig: None, sts: 1, class: IncomingRunner, slice: 1/4) [restarting] Feb 20 07:55:54 2007 (25731) IncomingRunner qrunner started. Feb 20 07:55:55 2007 (27933) Master qrunner detected subprocess exit (pid: 25731, sig: None, sts: 1, class: IncomingRunner, slice: 1/4) [restarting] Feb 20 07:55:55 2007 (25734) IncomingRunner qrunner started. Feb 20 07:55:55 2007 (27933) Master qrunner detected subprocess exit (pid: 25734, sig: None, sts: 1, class: IncomingRunner, slice: 1/4) [restarting]