[Mailman-Users] Stuck OutgoingRunner

Sebastian Hagedorn Hagedorn at uni-koeln.de
Sat Feb 3 04:03:56 EST 2018


Thanks for your reply!

> On 02/02/2018 02:26 AM, Sebastian Hagedorn wrote:
>> [root at mailman3/usr/lib/mailman/bin]$ lsof -p 1677
>> COMMAND    PID    USER   FD   TYPE   DEVICE SIZE/OFF  
>> NODE NAME python2.7 1677 mailman  cwd    DIR    253,0    
>> 4096 173998 /usr/lib/mailman
>> python2.7 1677 mailman  rtd    DIR    253,0    
>> 4096      2 / ...
>> python2.7 1677 mailman   10u  IPv6 46441320      0t0    TCP
>> mailman3.rrz.uni-koeln.de:55764->smtp-out.rrz.uni-koeln.de:smtp
>> (ESTABLISHED)
>>
>> In both instances the OutgoingRunner was stuck on an SMTP connection. I
>> had to use "kill -9" to get rid of it.
>>
>> Any ideas what might be causing that?
>
> I think I've seen this once or maybe twice, I don't recall details. I
> wasn't able to determine a cause. I haven't seen it in years.
>
> Did you look at the out queue, and if so was there a .bak file there.
> This would be the entry currently being processed.

I looked at the out queue, and there was no .bak file.

> Also, the TCP connection to the MTA being ESTABLISHED says the
> OutgoingRunner has called SMTPDirect.process() and it in turn is
> somewhere in its delivery loop of sending SMTP transactions.
>
> Are there any clues in the MTA logs?

I just found this in Mailman's smtp-failures log:

Feb 01 14:28:49 2018 (1674) Low level smtp error: [Errno 111] Connection 
refused, msgid: 
<B51BA08829F27146A07699F58B941234A27397BF at EX10DAG2.intern.xxx>
Feb 01 14:28:49 2018 (1674) delivery to xxx at uni-koeln.de failed with code 
-1: [Errno 111] Connection refused

I can't prove it, but this time stamp seems to coincide with the moment the 
OutgoingRunner got stuck, based on the age of the queue files. The 
receiving SMTP server was under heavy load at that moment, so it is 
possible that it might have refused the connection.

The message was delivered successfully after I killed the stuck runner and 
restarted the service. I wasn't able to find anything pertinent on the 
receiving server.

If this should happen again, what should we look for? Would a gdb backtrace 
be helpful?
--
Sebastian Hagedorn - Weyertal 121, Zimmer 2.02
Regionales Rechenzentrum (RRZK)
Universität zu Köln / Cologne University - Tel. +49-221-470-89578


More information about the Mailman-Users mailing list