[Mailman-Users] Admin email on errors?

Kurt Werle kwerle at pobox.com
Mon Mar 20 18:09:04 CET 2006


<quote who="Mark Sapiro">
> Kurt Werle wrote:
>
>
>> I have twice got the following error:
>> ---
>> Mar 14 22:41:58 2006 mailmanctl(54): The master qrunner lock could not
>> be acquired.  It appears as though there is a stale master qrunner lock.
>> Try re-running mailmanctl with the -s flag.
>> Mar 14 22:41:58 2006 mailmanctl(54):
>> Mar 14 22:56:02 2006 mailmanctl(48): The master qrunner lock could not
>> be acquired, because it appears as if some process on some other host may
>> have acquired it.  We can't test for stale locks across host boundaries,
>> so you'll have to do this manually.  Or, if you know the lock is stale,
>> re-run mailmanctl with the -s flag. ---
>
> These errors are the result of a 'mailmanctl start' when mailmanctl was
> either already running or died in some 'unclean' way. Are there any
> messages in the 'error' or 'qrunner' logs that might illuminate the
> problem that caused you to want to do the 'mailmanctl start' in the first
> place.

Not that I can see - though it looks like it did some thrashing AFTER
writing that log.

qrunner...
Mar 17 09:18:22 2006 (9277) VirginRunner qrunner started.
Mar 17 09:18:22 2006 (9276) OutgoingRunner qrunner started.
Mar 17 09:18:23 2006 (9274) IncomingRunner qrunner started.
Mar 17 09:18:23 2006 (9275) NewsRunner qrunner started.
Mar 17 09:18:23 2006 (9278) RetryRunner qrunner started.
Mar 17 09:18:23 2006 (9271) ArchRunner qrunner started.
Mar 17 09:18:23 2006 (9273) CommandRunner qrunner started.
Mar 17 09:18:23 2006 (9272) BounceRunner qrunner started.
Mar 17 09:34:34 2006 (9270) Master watcher caught SIGTERM.  Exiting.
Mar 17 09:34:34 2006 (9271) ArchRunner qrunner caught SIGTERM.  Stopping.
Mar 17 09:34:34 2006 (9271) ArchRunner qrunner exiting.
Mar 17 09:34:34 2006 (9272) BounceRunner qrunner caught SIGTERM.  Stopping.
Mar 17 09:34:34 2006 (9272) BounceRunner qrunner exiting.
Mar 17 09:34:34 2006 (9273) CommandRunner qrunner caught SIGTERM.  Stopping.
Mar 17 09:34:34 2006 (9273) CommandRunner qrunner exiting.
Mar 17 09:34:34 2006 (9274) IncomingRunner qrunner caught SIGTERM.  Stopping.
Mar 17 09:34:34 2006 (9274) IncomingRunner qrunner exiting.
Mar 17 09:34:34 2006 (9275) NewsRunner qrunner caught SIGTERM.  Stopping.
Mar 17 09:34:34 2006 (9275) NewsRunner qrunner exiting.
Mar 17 09:34:34 2006 (9276) OutgoingRunner qrunner caught SIGTERM.  Stopping.
Mar 17 09:34:34 2006 (9276) OutgoingRunner qrunner exiting.


> And, if not and mailmanctl died ungracefully, it's unlikely to be
> able to successfully send you an email about the situation.

You're telling me that it has the presense of mind to write an error log,
but can't do the equivalent of
echo mailman died | mail -s 'mailman died' $ADMIN

I'm not buying that.

>> Has anyone
>> hacked it in?  Do I have to write a cron job that will poll the process
>> list to see if mailman is still running?  Has anyone written that
>> already?
>
> There are posts in the mailman-users archives about this.

I did some searching, but couldn't find them.

> The real solution is to find the underlying problem and fix it so
> Mailman doesn't die.

I agree that software shouldn't crash.  I disagree that it won't crash.  I
insist that when server software crashes, it should send mail to an admin.

Kurt
-- 
kwerle at pobox.com
http://www.pobox.com/~kwerle/
Tired of spam? Control your Mailserver (or .forward)?
http://tess.sf.net





More information about the Mailman-Users mailing list