[Mailman-Users] Stuck OutgoingRunner
Hagedorn at uni-koeln.de
Fri Mar 16 07:37:04 EDT 2018
It happened again yesterday. Details below.
--On 7. Februar 2018 um 12:43:18 +0900 Yasuhito FUTATSUKI
<futatuki at poem.co.jp> wrote:
> In fact,
> On 02/02/18 19:26, Sebastian Hagedorn wrote:
>> root at mailman3/usr/lib/mailman/bin]$ strace -p 1677
>> Process 1677 attached
>> recvfrom(10, ^CProcess 1677 detached
> indicates the OutGoingRunner process 1677 was still in recvfrom(2)
> system call (perhaps called from recv(2)) for FD 10, and
>> [root at mailman3/usr/lib/mailman/bin]$ lsof -p 1677
>> COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
>> python2.7 1677 mailman cwd DIR 253,0 4096 173998
>> /usr/lib/mailman python2.7 1677 mailman rtd DIR 253,0 4096
>> 2 /
>> python2.7 1677 mailman 10u IPv6 46441320 0t0 TCP
> indicates its FD 10 was ESTABLISHED connection to the MTA.
That situation was exactly the same. This time we confirmed on the MTA that
there was no trace of that connection anymore. At the time of the incident,
the MTA was once again under high load and delaying commands. That
definitely seems to be a contributing factor. We didn't find any evidence
of a connection that was dropped by the MTA, but with four OutgoingRunners
we didn't find a way to determine which transaction related to which runner.
> If the MTA is hanging up (or very slow progress) in application layer and
> keeping alive TCP connection in lower layer, client using smtplib
> without specifying timeout, like current SMTPDirect handler in Mailman,
> must wait for response or the MTA dying.
If I understood Mark correctly, when the MTA dropped the connection that
should have raised socket.error regardless of timeouts. The question is why
it didn't. I suppose that could be either a bug in our version of the
Python libraries or in the OS. Any ideas how we should proceed to determine
the root cause?
.:.Sebastian Hagedorn - Weyertal 121 (Gebäude 133), Zimmer 2.02.:.
.:.Regionales Rechenzentrum (RRZK).:.
.:.Universität zu Köln / Cologne University - ✆ +49-221-470-89578.:.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 191 bytes
Desc: not available
More information about the Mailman-Users