Re: [Mailman-Users] Stuck OutgoingRunner

16 Mar 2018


      It happened again yesterday. Details below.
--On 7. Februar 2018 um 12:43:18 +0900 Yasuhito FUTATSUKI
futatuki@poem.co.jp wrote:
...
In fact,
On 02/02/18 19:26, Sebastian Hagedorn wrote:
...
root@mailman3/usr/lib/mailman/bin]$ strace -p 1677
Process 1677 attached
recvfrom(10, ^CProcess 1677 detached
indicates the OutGoingRunner process 1677 was still in recvfrom(2)
system call (perhaps called from recv(2)) for FD 10, and
...
[root@mailman3/usr/lib/mailman/bin]$ lsof -p 1677
COMMAND    PID    USER   FD   TYPE   DEVICE SIZE/OFF   NODE NAME
python2.7 1677 mailman  cwd    DIR    253,0     4096 173998
/usr/lib/mailman python2.7 1677 mailman  rtd    DIR    253,0     4096
2 /
...
python2.7 1677 mailman   10u  IPv6 46441320      0t0    TCP
mailman3.rrz.uni-koeln.de:55764->smtp-out.rrz.uni-koeln.de:smtp
(ESTABLISHED)
indicates its FD 10 was ESTABLISHED connection to the MTA.
That situation was exactly the same. This time we confirmed on the MTA that
there was no trace of that connection anymore. At the time of the incident,
the MTA was once again under high load and delaying commands. That
definitely seems to be a contributing factor. We didn't find any evidence
of a connection that was dropped by the MTA, but with four OutgoingRunners
we didn't find a way to determine which transaction related to which runner.
...
If the MTA is hanging up (or very slow progress) in application layer and
keeping alive TCP connection in lower layer, client using smtplib
without specifying timeout, like current SMTPDirect handler in Mailman,
must wait for response or the MTA dying.
If I understood Mark correctly, when the MTA dropped the connection that
should have raised socket.error regardless of timeouts. The question is why
it didn't. I suppose that could be either a bug in our version of the
Python libraries or in the OS. Any ideas how we should proceed to determine
the root cause?
.:.Sebastian Hagedorn - Weyertal 121 (Gebäude 133), Zimmer 2.02.:.
             .:.Regionales Rechenzentrum (RRZK).:.
.:.Universität zu Köln / Cologne University - ✆ +49-221-470-89578.:.