[Mailman-Users] Mailman stuck : mailmanctl dead with messages in /qfiles/in
Jérôme
jerome at jolimont.fr
Tue May 1 02:19:21 CEST 2012
Hi.
Thanks for answering.
Mon, 30 Apr 2012 16:15:03 -0700
Mark Sapiro a écrit:
> > 2/ Cron/mailmanctl
> >
> > ps auxww| grep mailmanctl |grep -v grep
> > -> Nothing.
>
> How about
>
> ps auxww| grep qrunner |grep -v grep
Nothing either.
> > 7/ Locks
> >
> > /var/lib/mailman/locks -> /var/lock/mailman
> >
> > ll /var/lock/mailman
> > total 0
>
> It appears that some process or person is stopping Mailman.
OK. Need to figure out which.
> > 8/ Logs
> >
> > /var/log/mailman/error :
> > Apr 30 03:16:21 2012 mailmanctl(11685): No child with pid: 17093
> > Apr 30 03:16:21 2012 mailmanctl(11685): [Errno 3] No such process
> > Apr 30 03:16:21 2012 mailmanctl(11685): Stale pid file removed.
>
>
> How about /var/log/mailman/qrunner ?
Each day, I have something like this :
Apr 28 03:16:33 2012 (17099) OutgoingRunner qrunner caught SIGHUP. Reopening
logs. Apr 28 03:16:33 2012 (17094) ArchRunner qrunner caught SIGHUP.
Reopening logs. Apr 28 03:16:33 2012 (17097) IncomingRunner qrunner caught
SIGHUP. Reopening logs. Apr 28 03:16:33 2012 (17093) Master watcher caught
SIGHUP. Re-opening log files. Apr 28 03:16:34 2012 (17095) BounceRunner
qrunner caught SIGHUP. Reopening logs. Apr 28 03:16:34 2012 (17101)
RetryRunner qrunner caught SIGHUP. Reopening logs. Apr 28 03:16:34 2012
(17096) CommandRunner qrunner caught SIGHUP. Reopening logs. Apr 28 03:16:34
2012 (17098) NewsRunner qrunner caught SIGHUP. Reopening logs. Apr 28
03:16:34 2012 (17100) VirginRunner qrunner caught SIGHUP. Reopening logs.
The day it stopped, I got this :
Apr 29 03:16:29 2012 (17099) OutgoingRunner qrunner caught SIGHUP. Reopening
logs. Apr 29 03:16:29 2012 (17094) ArchRunner qrunner caught SIGHUP.
Reopening logs. Apr 29 03:16:29 2012 (17097) IncomingRunner qrunner caught
SIGHUP. Reopening logs. Apr 29 03:16:29 2012 (17093) Master watcher caught
SIGHUP. Re-opening log files. Apr 29 03:16:29 2012 (17097) IncomingRunner
qrunner caught SIGTERM. Stopping. Apr 29 03:16:29 2012 (17099)
OutgoingRunner qrunner caught SIGTERM. Stopping. Apr 29 03:16:29 2012
(17097) IncomingRunner qrunner exiting. Apr 29 03:16:29 2012 (17094)
ArchRunner qrunner caught SIGTERM. Stopping. Apr 29 03:16:29 2012 (17099)
OutgoingRunner qrunner exiting. Apr 29 03:16:29 2012 (17094) ArchRunner
qrunner exiting. Apr 29 03:16:29 2012 (17096) CommandRunner qrunner caught
SIGHUP. Reopening logs. Apr 29 03:16:29 2012 (17101) RetryRunner qrunner
caught SIGHUP. Reopening logs. Apr 29 03:16:29 2012 (17095) BounceRunner
qrunner caught SIGHUP. Reopening logs. Apr 29 03:16:29 2012 (17098)
NewsRunner qrunner caught SIGHUP. Reopening logs. Apr 29 03:16:29 2012
(17098) NewsRunner qrunner caught SIGTERM. Stopping. Apr 29 03:16:29 2012
(17095) BounceRunner qrunner caught SIGTERM. Stopping. Apr 29 03:16:29 2012
(17096) CommandRunner qrunner caught SIGTERM. Stopping. Apr 29 03:16:29 2012
(17101) RetryRunner qrunner caught SIGTERM. Stopping. Apr 29 03:16:29 2012
(17100) VirginRunner qrunner caught SIGHUP. Reopening logs. Apr 29 03:16:29
2012 (17096) CommandRunner qrunner exiting. Apr 29 03:16:29 2012 (17098)
NewsRunner qrunner exiting. Apr 29 03:16:29 2012 (17095) BounceRunner qrunner
exiting. Apr 29 03:16:29 2012 (17100) VirginRunner qrunner caught SIGTERM.
Stopping. Apr 29 03:16:29 2012 (17101) RetryRunner qrunner exiting.
Apr 29 03:16:29 2012 (17100) VirginRunner qrunner exiting.
Sorry for the mess, here. But I think you get the idea.
Seems to happen during a cron job.
Bug reports that could be related :
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=505638
https://bugs.launchpad.net/mailman/+bug/265855
> > modified
> > /var/lib/mailman/Mailman/Handlers/SMTPDirect.py
> > to add
> > self.__conn.set_debuglevel(1)
>
> And yet you are not logging any smtp debugging in Mailman's error log.
> There should be copious log information for every outgoing message.
There was. But it stopped. Last message for which I do have a lot of info is
on Apr 22, one week before mailman stopped sending messages.
-rw-rw-r-- 1 list list 198 Apr 30 03:16 /var/log/mailman/error
-rw-rw-r-- 1 list list 0 Apr 22 03:16 /var/log/mailman/error.1
-rw-rw-r-- 1 list list 0 Apr 15 03:16 /var/log/mailman/error.2
-rw-rw-r-- 1 list list 36541617 Apr 22 01:59 /var/log/mailman/error.3
Should there be anything relevant in there ?
> > Configuration
> > -------------
> >
> > Not sure this is useful, but
> > /etc/mailman/mm_cfg.py contains
> > MTA='LocalPostfix'
>
> The above line should cause significant problems when attempting to
> create or remove lists. it MUST be one of
>
> MTA = 'Postfix'
> MTA = 'Manual'
> MTA = None
>
> 'Postfix' means generate aliases and virtual-mailman files for Postfix.
> 'Manual' means display the necessary aliases
> None means don't do anything with aliases when lists are created/removed.
I configured mailman 3 years ago. I don't remember everything but it comes
from here :
http://isp-control.net/documentation/howto/mail/setup_mailman
Is it such a bad idea ?
I suppose it is unrelated, anyway.
Good thing is there is a relatively recent bug opened on debian that might be
closed if we managed to rootcause and solve this.
I just did a little bit of cleanup tonight, after I realized the server was
almost full. At least the partition that hosts mailman queues and logs. Would
we see something specific in case of lack of space ?
Thank you for your help.
--
Jérôme
More information about the Mailman-Users
mailing list