Hi,
we've been running Mailman for many years and have never had stability issues, but about a month ago we moved the server from RHEL 5 to RHEL 6 and to the current version (2.1.25), and since then it has already happened twice that one of our four OutgoingRunners got "stuck" and stopped handling mail. When that happens a simple restart of the service does not work. These processes remained:
mailman 1663 0.0 0.0 233860 2204 ? Ss Jan16 0:00 /usr/bin/python2.7 /usr/lib/mailman/bin/mailmanctl -s -q start mailman 1677 0.1 0.9 295064 73284 ? S Jan16 35:35 /usr/bin/python2.7 /usr/lib/mailman/bin/qrunner --runner=OutgoingRunner:3:4 -s
root@mailman3/usr/lib/mailman/bin]$ strace -p 1677 Process 1677 attached recvfrom(10, ^CProcess 1677 detached
[root@mailman3/usr/lib/mailman/bin]$ lsof -p 1677 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME python2.7 1677 mailman cwd DIR 253,0 4096 173998 /usr/lib/mailman python2.7 1677 mailman rtd DIR 253,0 4096 2 / ... python2.7 1677 mailman 10u IPv6 46441320 0t0 TCP mailman3.rrz.uni-koeln.de:55764->smtp-out.rrz.uni-koeln.de:smtp (ESTABLISHED)
In both instances the OutgoingRunner was stuck on an SMTP connection. I had to use "kill -9" to get rid of it.
Any ideas what might be causing that?
Cheers Sebastian
.:.Sebastian Hagedorn - Weyertal 121 (Gebäude 133), Zimmer 2.02.:.
.:.Regionales Rechenzentrum (RRZK).:.
.:.Universität zu Köln / Cologne University - ✆ +49-221-470-89578.:.