[Mailman-Users] Kernel update breaks Mailman!!

Lindsay Haisley fmouse at fmp.com
Thu Feb 20 23:04:28 CET 2014


On Thu, 2014-02-20 at 10:37 -0800, Mark Sapiro wrote:
> On 02/20/2014 10:07 AM, Lindsay Haisley wrote:
> > I'm running Mailman 2.1.15 on a Ubuntu server, feeding into Courier MTA,
> > running Python 2.7.3.  I track security updates and install them
> > promptly when they're issued by Ubuntu.  Yesterday I updated the Linux
> > kernel from 3.2.0-58-generic (x86_64) to 3.2.0-59-generic and Mailman
> > quit working.  List posts made it through to the archives, and were
> > apparently queued within Mailman, but wouldn't go out.  The mail server
> > was working OK for non-list email. Today I backed out the kernel update
> > and posts to lists sent yesterday and today are going out without
> > problems.
> 
> 
> What's in Mailman's 'post' and 'smtp' logs for these messages. Are they
> timestamped before or after you backed out the update. If before, they
> were queued in the MTA. If after, they were in Mailman's 'out' queue.

They weren't in the MTA's queue.  Looking at the count of messages in
the MTA queue was how I determined that list posts weren't being
delivered to the MTA by Mailman.  I restarted qrunner and it didn't make
any difference.  The mail queue had like 67 messages in it.  This would
go up to 68 or 69 at time and then fall back down again - normal
behavior.  I could send and receive mail.  All indications are that the
MTA was working normally.  Mailman lists run from several hundred to a
couple of thousand subscribers and if someone posts to a list the MTA
mail queue shoots up to hundreds of messages with VERP sender addresses
shown in the queue summary, and then works its way back down.

> If the latter, what's in Mailman's 'qrunner' log related to OutgoingRunner.

Here's a sampling of the qrunner log from the wee hours, before I
started poking at the problem to try to fix it:

Feb 20 03:22:02 2014 (2447) IncomingRunner qrunner caught SIGINT.  Stopping.
Feb 20 03:22:02 2014 (2447) IncomingRunner qrunner exiting.
Feb 20 03:22:02 2014 (2445) BounceRunner qrunner caught SIGINT.  Stopping.
Feb 20 03:22:02 2014 (2445) BounceRunner qrunner exiting.
Feb 20 03:22:02 2014 (2446) CommandRunner qrunner caught SIGINT.  Stopping.
Feb 20 03:22:02 2014 (2446) CommandRunner qrunner exiting.
Feb 20 03:22:02 2014 (2451) RetryRunner qrunner caught SIGINT.  Stopping.
Feb 20 03:22:02 2014 (2443) Master watcher caught SIGINT.  Restarting.
Feb 20 03:22:02 2014 (2444) ArchRunner qrunner caught SIGINT.  Stopping.
Feb 20 03:22:02 2014 (2444) ArchRunner qrunner exiting.
Feb 20 03:22:02 2014 (2448) NewsRunner qrunner caught SIGINT.  Stopping.
Feb 20 03:22:02 2014 (2450) VirginRunner qrunner caught SIGINT.  Stopping.
Feb 20 03:22:02 2014 (2449) OutgoingRunner qrunner caught SIGINT.  Stopping.
Feb 20 03:22:02 2014 (2451) RetryRunner qrunner exiting.
Feb 20 03:22:02 2014 (2448) NewsRunner qrunner exiting.
Feb 20 03:22:02 2014 (2450) VirginRunner qrunner exiting.
Feb 20 03:22:02 2014 (2443) Master qrunner detected subprocess exit
(pid: 2445, sig: None, sts: 2, class: BounceRunner, slice: 1/1) [restarting]
Feb 20 03:22:02 2014 (2443) Master qrunner detected subprocess exit
(pid: 2446, sig: None, sts: 2, class: CommandRunner, slice: 1/1) [restarting]
Feb 20 03:22:02 2014 (2443) Master qrunner detected subprocess exit
(pid: 2451, sig: None, sts: 2, class: RetryRunner, slice: 1/1) [restarting]
Feb 20 03:22:02 2014 (2443) Master qrunner detected subprocess exit
(pid: 2448, sig: None, sts: 2, class: NewsRunner, slice: 1/1) [restarting]
Feb 20 03:22:02 2014 (2443) Master qrunner detected subprocess exit
(pid: 2444, sig: None, sts: 2, class: ArchRunner, slice: 1/1) [restarting]
Feb 20 03:22:02 2014 (2443) Master qrunner detected subprocess exit
(pid: 2447, sig: None, sts: 2, class: IncomingRunner, slice: 1/1) [restarting]

FWIW, another very strange thing happened after the kernel upgrade,
totally unrelated to mail.  I run bind9 on the same server, and it
provides recursive DNS for all our in-house boxes coming from our LAN
through our VPN to our server.  This has been working fine for some
time, but after the kernel upgrade it quit working.  The bind9 config
specifies that if there's no ACL in the bind config then bind listens on
ALL interfaces.  There was an interface ACL for IPv6 but none for v4.
After the upgrade, bind no longer worked for us as our recursive server
UNLESS I provided an v4 interface ACL, which I did, and it started
working again.  Go figure.


-- 
Lindsay Haisley       | "Everything works if you let it"
FMP Computer Services |
512-259-1190          |          --- The Roadie
http://www.fmp.com    |



More information about the Mailman-Users mailing list