[Mailman-Users] Admin email on errors?

Mon Mar 20 21:46:17 CET 2006

> This indicates a 'mailmanctl stop' command or some other event resulted
> in a SIGTERM being sent to the running mailmanctl. (And are these messages
> from two and a half days later supposed to be related to those above?)

Nope.  But that's all I saw in the logs that was in the same week.

>> Mar 17 09:34:34 2006 (9271) ArchRunner qrunner caught SIGTERM.
>> Stopping.
...
> And these are the result of the subsequent normal shutdown.

Ah - that'd be a reboot, then.

>>> And, if not and mailmanctl died ungracefully, it's unlikely to be
>>> able to successfully send you an email about the situation.
>>
>> You're telling me that it has the presense of mind to write an error
>> log, but can't do the equivalent of echo mailman died | mail -s 'mailman
>> died' $ADMIN
>>
>> I'm not buying that.
>
> No. I'm telling you that your original post included nothing about why
> mailmanctl or any qrunner stopped in the first place, thus I had no
> evidence that anything was written when it stopped, only the messages from
> subsequent start attempts and the fact that if it had in fact died, it did
> so without removing the lock file.

Ah-ha!  Now I see.  That was a start failure, not a death failure. 
Gotcha.  So I'm now asking 2 questions: what fired mailmanctl at that
time, and why didn't it send the admin mail when it failed to start up?

> So were there log messages about the original termination prior the the
> start attempts at Mar 14 22:41:58 2006 and Mar 14 22:56:02 2006?

Nope.

>>>> Has anyone
>>>> hacked it in?  Do I have to write a cron job that will poll the
>>>> process list to see if mailman is still running?  Has anyone written
>>>> that already?
>>>
>>> There are posts in the mailman-users archives about this.
>>>
>>
>> I did some searching, but couldn't find them.
>
> I know I bring this on myself by doing it so much, but I don't like
> being used as a search engine for the Mailman FAQ and mailman-users
> archives.

I appreciate you stooping.  Nowhere in the FAQ does it mention the error I
quoted.  Nor does it mention anything about monitoring for failure.  Nor
does the admin section of the faq seem to say anything about notification
on error.

> Try the entire thread that begins at
> <http://mail.python.org/pipermail/mailman-users/2005-May/044888.html>
> which I found fairly quickly with
> <http://www.google.com/search?q=site:mail.python.org++inurl:mailman-users
> ++cron+restart+mailmanctl>.

Which I would never have found, since I was searching for mailman,
notification, errors, and the errorstring I mentioned.  There is a thread
that mentions the same error (and it is also on OSX), but it didn't seem
to resolve.

>>> The real solution is to find the underlying problem and fix it so
>>> Mailman doesn't die.
>>
>> I agree that software shouldn't crash.  I disagree that it won't crash.
>> I
>> insist that when server software crashes, it should send mail to an
>> admin.
>
> It's an open source project. We're all volunteers. Feel free to
> implement whatever you need. Insisting that others do the work won't get
> you very far.

I don't insist that others implement it - but I figure it's worth asking
if anyone has already done the work before I roll it out.

Casual inspection of mailmanctl looks like the exception thrown is not
caught, which makes me wonder what the exit code is (I'm not much of a
python coder).  Depending on what called mailmanctl that generated the
exception in the first place, it seems like returning an error code would
be plenty enough tell the caller to make noise.

Thanks,
Kurt
-- 
kwerle at pobox.com
http://www.pobox.com/~kwerle/
Tired of spam? Control your Mailserver (or .forward)?
http://tess.sf.net