
On 4/30/2012 5:19 PM, Jérôme wrote:
Mark Sapiro a écrit:
How about /var/log/mailman/qrunner ?
Each day, I have something like this : Apr 28 03:16:33 2012 (17099) OutgoingRunner qrunner caught SIGHUP. Reopening logs. Apr 28 03:16:33 2012 (17094) ArchRunner qrunner caught SIGHUP. Reopening logs. Apr 28 03:16:33 2012 (17097) IncomingRunner qrunner caught SIGHUP. Reopening logs. Apr 28 03:16:33 2012 (17093) Master watcher caught SIGHUP. Re-opening log files. Apr 28 03:16:34 2012 (17095) BounceRunner qrunner caught SIGHUP. Reopening logs. Apr 28 03:16:34 2012 (17101) RetryRunner qrunner caught SIGHUP. Reopening logs. Apr 28 03:16:34 2012 (17096) CommandRunner qrunner caught SIGHUP. Reopening logs. Apr 28 03:16:34 2012 (17098) NewsRunner qrunner caught SIGHUP. Reopening logs. Apr 28 03:16:34 2012 (17100) VirginRunner qrunner caught SIGHUP. Reopening logs.
That is presumably a logrotate process.
The day it stopped, I got this : Apr 29 03:16:29 2012 (17099) OutgoingRunner qrunner caught SIGHUP. Reopening logs. Apr 29 03:16:29 2012 (17094) ArchRunner qrunner caught SIGHUP. Reopening logs. Apr 29 03:16:29 2012 (17097) IncomingRunner qrunner caught SIGHUP. Reopening logs. Apr 29 03:16:29 2012 (17093) Master watcher caught SIGHUP. Re-opening log files. Apr 29 03:16:29 2012 (17097) IncomingRunner qrunner caught SIGTERM. Stopping. Apr 29 03:16:29 2012 (17099) OutgoingRunner qrunner caught SIGTERM. Stopping. Apr 29 03:16:29 2012 (17097) IncomingRunner qrunner exiting. Apr 29 03:16:29 2012 (17094) ArchRunner qrunner caught SIGTERM. Stopping. Apr 29 03:16:29 2012 (17099) OutgoingRunner qrunner exiting. Apr 29 03:16:29 2012 (17094) ArchRunner qrunner exiting. Apr 29 03:16:29 2012 (17096) CommandRunner qrunner caught SIGHUP. Reopening logs. Apr 29 03:16:29 2012 (17101) RetryRunner qrunner caught SIGHUP. Reopening logs. Apr 29 03:16:29 2012 (17095) BounceRunner qrunner caught SIGHUP. Reopening logs. Apr 29 03:16:29 2012 (17098) NewsRunner qrunner caught SIGHUP. Reopening logs. Apr 29 03:16:29 2012 (17098) NewsRunner qrunner caught SIGTERM. Stopping. Apr 29 03:16:29 2012 (17095) BounceRunner qrunner caught SIGTERM. Stopping. Apr 29 03:16:29 2012 (17096) CommandRunner qrunner caught SIGTERM. Stopping. Apr 29 03:16:29 2012 (17101) RetryRunner qrunner caught SIGTERM. Stopping. Apr 29 03:16:29 2012 (17100) VirginRunner qrunner caught SIGHUP. Reopening logs. Apr 29 03:16:29 2012 (17096) CommandRunner qrunner exiting. Apr 29 03:16:29 2012 (17098) NewsRunner qrunner exiting. Apr 29 03:16:29 2012 (17095) BounceRunner qrunner exiting. Apr 29 03:16:29 2012 (17100) VirginRunner qrunner caught SIGTERM. Stopping. Apr 29 03:16:29 2012 (17101) RetryRunner qrunner exiting. Apr 29 03:16:29 2012 (17100) VirginRunner qrunner exiting.
So here in addition to the normal SIGHUPs, presumably from logrotate, you also have SIGTERMs possibly from a "bin/mailmanctl stop" although you don't show a "Master watcher caught SIGTERM" entry.
Check your system cron log to see what was running at the time.
Sorry for the mess, here. But I think you get the idea.
Seems to happen during a cron job.
Bug reports that could be related : http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=505638 https://bugs.launchpad.net/mailman/+bug/265855
modified /var/lib/mailman/Mailman/Handlers/SMTPDirect.py to add self.__conn.set_debuglevel(1)
And yet you are not logging any smtp debugging in Mailman's error log. There should be copious log information for every outgoing message.
There was. But it stopped. Last message for which I do have a lot of info is on Apr 22, one week before mailman stopped sending messages.
-rw-rw-r-- 1 list list 198 Apr 30 03:16 /var/log/mailman/error -rw-rw-r-- 1 list list 0 Apr 22 03:16 /var/log/mailman/error.1 -rw-rw-r-- 1 list list 0 Apr 15 03:16 /var/log/mailman/error.2 -rw-rw-r-- 1 list list 36541617 Apr 22 01:59 /var/log/mailman/error.3
Should there be anything relevant in there ?
No. That isn't relevant to this issue. Apparently, something reversed the patch on April 22 which is just as well.
The curious thing is you are rotating logs weekly, but "reopening" them daily. Either this is two different processes or your logrotate script is a bit strange.
Configuration
Not sure this is useful, but /etc/mailman/mm_cfg.py contains MTA='LocalPostfix'
The above line should cause significant problems when attempting to create or remove lists. it MUST be one of
MTA = 'Postfix' MTA = 'Manual' MTA = None
'Postfix' means generate aliases and virtual-mailman files for Postfix. 'Manual' means display the necessary aliases None means don't do anything with aliases when lists are created/removed.
I configured mailman 3 years ago. I don't remember everything but it comes from here : http://isp-control.net/documentation/howto/mail/setup_mailman
Which is wrong as it contains lines like
MTA=Postfix
and
MTA=localPostfix
If you put either of those lines literally in mm_cfg.py without quotes around Postfix or localPostfix, Mailman won't run at all because every Mailman process will encounter a fatal error on importing mm_cfg.
Apparently you figured that out as you have quoted it.
MTA='LocalPostfix'
and presumably you have named your edited module LocalPostfix.py rather than localPostfix.py or it would be throwing errors when creating/removing lists.
Is it such a bad idea ?
No, because the howto you followed had you create Mailman/MTA/localPostfix.py (or LocalPostfix.py), but how was I supposed to know that?
See the FAQ at <http://wiki.list.org/x/OIDD>.
I suppose it is unrelated, anyway.
Yes, it is unrelated.
Good thing is there is a relatively recent bug opened on debian that might be closed if we managed to rootcause and solve this.
I just did a little bit of cleanup tonight, after I realized the server was almost full. At least the partition that hosts mailman queues and logs. Would we see something specific in case of lack of space ?
You would begin to see exceptions thrown for inability to create/write files.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan