On 02/02/2018 02:26 AM, Sebastian Hagedorn wrote:
Hi,
we've been running Mailman for many years and have never had stability issues, but about a month ago we moved the server from RHEL 5 to RHEL 6 and to the current version (2.1.25), and since then it has already happened twice that one of our four OutgoingRunners got "stuck" and stopped handling mail. When that happens a simple restart of the service does not work. These processes remained:
mailman 1663 0.0 0.0 233860 2204 ? Ss Jan16 0:00 /usr/bin/python2.7 /usr/lib/mailman/bin/mailmanctl -s -q start mailman 1677 0.1 0.9 295064 73284 ? S Jan16 35:35 /usr/bin/python2.7 /usr/lib/mailman/bin/qrunner --runner=OutgoingRunner:3:4 -s
Because pid 1677 didn't respond to the SIGINT from the master and the master is still waiting for it to exit.
root@mailman3/usr/lib/mailman/bin]$ strace -p 1677 Process 1677 attached recvfrom(10, ^CProcess 1677 detached
[root@mailman3/usr/lib/mailman/bin]$ lsof -p 1677 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME python2.7 1677 mailman cwd DIR 253,0 4096 173998 /usr/lib/mailman python2.7 1677 mailman rtd DIR 253,0 4096 2 / ... python2.7 1677 mailman 10u IPv6 46441320 0t0 TCP mailman3.rrz.uni-koeln.de:55764->smtp-out.rrz.uni-koeln.de:smtp (ESTABLISHED)
In both instances the OutgoingRunner was stuck on an SMTP connection. I had to use "kill -9" to get rid of it.
Any ideas what might be causing that?
I think I've seen this once or maybe twice, I don't recall details. I wasn't able to determine a cause. I haven't seen it in years.
Did you look at the out queue, and if so was there a .bak file there. This would be the entry currently being processed.
Also, the TCP connection to the MTA being ESTABLISHED says the OutgoingRunner has called SMTPDirect.process() and it in turn is somewhere in its delivery loop of sending SMTP transactions.
Are there any clues in the MTA logs?
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan