[Mailman-Users] Gateway Timeout Issue

Mark Sapiro mark at msapiro.net
Tue Jan 7 02:46:39 CET 2014


On 01/06/2014 05:31 PM, Chuck Weinstock wrote:
> Thanks!
> 
> Yes to the stale lock problem. Regarding the other problem…the last time it shut down was January 1. Here are some of the qrunner log entries just prior to that:
> 
>> Dec 30 18:17:20 2013 (8351) Master qrunner detected subprocess exit
>> (pid: 2209, sig: 9, sts: None, class: ArchRunner, slice: 1/1) [restarting]
>> Dec 30 18:17:23 2013 (16892) ArchRunner qrunner started.
>> Dec 31 00:21:05 2013 (8351) Master qrunner detected subprocess exit
>> (pid: 16892, sig: 9, sts: None, class: ArchRunner, slice: 1/1) [restarting]
>> Dec 31 00:21:10 2013 (31527) ArchRunner qrunner started.
>> Dec 31 06:25:01 2013 (8351) Master qrunner detected subprocess exit
>> (pid: 15347, sig: 9, sts: None, class: IncomingRunner, slice: 1/1) [restarting]
>> Dec 31 06:25:04 2013 (13794) IncomingRunner qrunner started.
>> Dec 31 12:28:51 2013 (8351) Master qrunner detected subprocess exit
>> (pid: 13794, sig: 9, sts: None, class: IncomingRunner, slice: 1/1) [restarting]
>> Dec 31 12:28:53 2013 (28877) IncomingRunner qrunner started.
>> Dec 31 18:32:44 2013 (8351) Master qrunner detected subprocess exit
>> (pid: 31527, sig: 9, sts: None, class: ArchRunner, slice: 1/1) [restarting]
>> Dec 31 18:32:46 2013 (10916) ArchRunner qrunner started.
>> Jan 01 00:36:02 2014 (8351) Master qrunner detected subprocess exit
>> (pid: 12268, sig: 9, sts: None, class: OutgoingRunner, slice: 1/1) [restarting]
>> Jan 01 00:36:04 2014 (25317) OutgoingRunner qrunner started.
>> Jan 01 12:43:48 2014 (8351) Master qrunner detected subprocess exit
>> (pid: 10916, sig: 9, sts: None, class: ArchRunner, slice: 1/1) [restarting]
>> Jan 01 12:43:50 2014 (22804) ArchRunner qrunner started.
>> Jan 01 15:22:22 2014 (8351) Master qrunner detected subprocess exit
>> (pid: 28877, sig: 9, sts: None, class: IncomingRunner, slice: 1/1) [restarting]
>> Jan 01 15:22:22 2014 (8351) Qrunner IncomingRunner reached maximum restart limit of 10, not restarting.


All of the above are signal 9 (SIGKILL). Do you have some cron or other
process that's SIGKILLing the qrunners in an attempt to keep them small
or for some other reason? See the FAQ at <http://wiki.list.org/x/94A9>.


> Also there are no errors in the error log around the same time. I am seeing a bunch of errors (now) like:
> 
>> Jan 05 20:46:54 2014 (1522) Uncaught runner exception: [Errno 2] No such file or directory: '/usr/local/mailman/qfiles/in/1388971910.759851+b18e7af8cb0632a2d9f551c9e39053510b278e9a.pck'
>> Jan 05 20:46:54 2014 (1522) Traceback (most recent call last):
>>   File "/usr/local/mailman/Mailman/Queue/Runner.py", line 99, in _oneloop
>>     msg, msgdata = self._switchboard.dequeue(filebase)
>>   File "/usr/local/mailman/Mailman/Queue/Switchboard.py", line 154, in dequeue
>>     fp = open(filename)
>> IOError: [Errno 2] No such file or directory: '/usr/local/mailman/qfiles/in/1388971910.759851+b18e7af8cb0632a2d9f551c9e39053510b278e9a.pck'
>>
>> Jan 05 20:46:54 2014 (1522) Skipping and preserving unparseable message: 1388971910.759851+b18e7af8cb0632a2d9f551c9e39053510b278e9a
>> Jan 05 20:46:54 2014 (1522) Failed to unlink/preserve backup file: /usr/local/mailman/qfiles/in/1388971910.759851+b18e7af8cb0632a2d9f551c9e39053510b278e9a.bak
>> [Errno 2] No such file or directory
> 
> 
> I think these are related to some pck files that I hand deleted because I thought they were causing the stale lock problem.


I think these are because you have more than one qrunner processing the
same slice of the same queue. See the FAQ at <http://wiki.list.org/x/_4A9>.

-- 
Mark Sapiro <mark at msapiro.net>        The highway is for gamblers,
San Francisco Bay Area, California    better use your sense - B. Dylan


More information about the Mailman-Users mailing list