
Hi.
Thanks for answering.
Mon, 30 Apr 2012 16:15:03 -0700 Mark Sapiro a écrit:
2/ Cron/mailmanctl
ps auxww| grep mailmanctl |grep -v grep -> Nothing.
How about
ps auxww| grep qrunner |grep -v grep
Nothing either.
7/ Locks
/var/lib/mailman/locks -> /var/lock/mailman
ll /var/lock/mailman total 0
It appears that some process or person is stopping Mailman.
OK. Need to figure out which.
8/ Logs
/var/log/mailman/error : Apr 30 03:16:21 2012 mailmanctl(11685): No child with pid: 17093 Apr 30 03:16:21 2012 mailmanctl(11685): [Errno 3] No such process Apr 30 03:16:21 2012 mailmanctl(11685): Stale pid file removed.
How about /var/log/mailman/qrunner ?
Each day, I have something like this : Apr 28 03:16:33 2012 (17099) OutgoingRunner qrunner caught SIGHUP. Reopening logs. Apr 28 03:16:33 2012 (17094) ArchRunner qrunner caught SIGHUP. Reopening logs. Apr 28 03:16:33 2012 (17097) IncomingRunner qrunner caught SIGHUP. Reopening logs. Apr 28 03:16:33 2012 (17093) Master watcher caught SIGHUP. Re-opening log files. Apr 28 03:16:34 2012 (17095) BounceRunner qrunner caught SIGHUP. Reopening logs. Apr 28 03:16:34 2012 (17101) RetryRunner qrunner caught SIGHUP. Reopening logs. Apr 28 03:16:34 2012 (17096) CommandRunner qrunner caught SIGHUP. Reopening logs. Apr 28 03:16:34 2012 (17098) NewsRunner qrunner caught SIGHUP. Reopening logs. Apr 28 03:16:34 2012 (17100) VirginRunner qrunner caught SIGHUP. Reopening logs.
The day it stopped, I got this : Apr 29 03:16:29 2012 (17099) OutgoingRunner qrunner caught SIGHUP. Reopening logs. Apr 29 03:16:29 2012 (17094) ArchRunner qrunner caught SIGHUP. Reopening logs. Apr 29 03:16:29 2012 (17097) IncomingRunner qrunner caught SIGHUP. Reopening logs. Apr 29 03:16:29 2012 (17093) Master watcher caught SIGHUP. Re-opening log files. Apr 29 03:16:29 2012 (17097) IncomingRunner qrunner caught SIGTERM. Stopping. Apr 29 03:16:29 2012 (17099) OutgoingRunner qrunner caught SIGTERM. Stopping. Apr 29 03:16:29 2012 (17097) IncomingRunner qrunner exiting. Apr 29 03:16:29 2012 (17094) ArchRunner qrunner caught SIGTERM. Stopping. Apr 29 03:16:29 2012 (17099) OutgoingRunner qrunner exiting. Apr 29 03:16:29 2012 (17094) ArchRunner qrunner exiting. Apr 29 03:16:29 2012 (17096) CommandRunner qrunner caught SIGHUP. Reopening logs. Apr 29 03:16:29 2012 (17101) RetryRunner qrunner caught SIGHUP. Reopening logs. Apr 29 03:16:29 2012 (17095) BounceRunner qrunner caught SIGHUP. Reopening logs. Apr 29 03:16:29 2012 (17098) NewsRunner qrunner caught SIGHUP. Reopening logs. Apr 29 03:16:29 2012 (17098) NewsRunner qrunner caught SIGTERM. Stopping. Apr 29 03:16:29 2012 (17095) BounceRunner qrunner caught SIGTERM. Stopping. Apr 29 03:16:29 2012 (17096) CommandRunner qrunner caught SIGTERM. Stopping. Apr 29 03:16:29 2012 (17101) RetryRunner qrunner caught SIGTERM. Stopping. Apr 29 03:16:29 2012 (17100) VirginRunner qrunner caught SIGHUP. Reopening logs. Apr 29 03:16:29 2012 (17096) CommandRunner qrunner exiting. Apr 29 03:16:29 2012 (17098) NewsRunner qrunner exiting. Apr 29 03:16:29 2012 (17095) BounceRunner qrunner exiting. Apr 29 03:16:29 2012 (17100) VirginRunner qrunner caught SIGTERM. Stopping. Apr 29 03:16:29 2012 (17101) RetryRunner qrunner exiting. Apr 29 03:16:29 2012 (17100) VirginRunner qrunner exiting.
Sorry for the mess, here. But I think you get the idea.
Seems to happen during a cron job.
Bug reports that could be related : http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=505638 https://bugs.launchpad.net/mailman/+bug/265855
modified /var/lib/mailman/Mailman/Handlers/SMTPDirect.py to add self.__conn.set_debuglevel(1)
And yet you are not logging any smtp debugging in Mailman's error log. There should be copious log information for every outgoing message.
There was. But it stopped. Last message for which I do have a lot of info is on Apr 22, one week before mailman stopped sending messages.
-rw-rw-r-- 1 list list 198 Apr 30 03:16 /var/log/mailman/error -rw-rw-r-- 1 list list 0 Apr 22 03:16 /var/log/mailman/error.1 -rw-rw-r-- 1 list list 0 Apr 15 03:16 /var/log/mailman/error.2 -rw-rw-r-- 1 list list 36541617 Apr 22 01:59 /var/log/mailman/error.3
Should there be anything relevant in there ?
Configuration
Not sure this is useful, but /etc/mailman/mm_cfg.py contains MTA='LocalPostfix'
The above line should cause significant problems when attempting to create or remove lists. it MUST be one of
MTA = 'Postfix' MTA = 'Manual' MTA = None
'Postfix' means generate aliases and virtual-mailman files for Postfix. 'Manual' means display the necessary aliases None means don't do anything with aliases when lists are created/removed.
I configured mailman 3 years ago. I don't remember everything but it comes from here : http://isp-control.net/documentation/howto/mail/setup_mailman
Is it such a bad idea ?
I suppose it is unrelated, anyway.
Good thing is there is a relatively recent bug opened on debian that might be closed if we managed to rootcause and solve this.
I just did a little bit of cleanup tonight, after I realized the server was almost full. At least the partition that hosts mailman queues and logs. Would we see something specific in case of lack of space ?
Thank you for your help.
-- Jérôme