[Mailman-Users] Gateway Timeout Issue
Mark Sapiro
mark at msapiro.net
Tue Jan 7 02:46:39 CET 2014
On 01/06/2014 05:31 PM, Chuck Weinstock wrote:
> Thanks!
>
> Yes to the stale lock problem. Regarding the other problem…the last time it shut down was January 1. Here are some of the qrunner log entries just prior to that:
>
>> Dec 30 18:17:20 2013 (8351) Master qrunner detected subprocess exit
>> (pid: 2209, sig: 9, sts: None, class: ArchRunner, slice: 1/1) [restarting]
>> Dec 30 18:17:23 2013 (16892) ArchRunner qrunner started.
>> Dec 31 00:21:05 2013 (8351) Master qrunner detected subprocess exit
>> (pid: 16892, sig: 9, sts: None, class: ArchRunner, slice: 1/1) [restarting]
>> Dec 31 00:21:10 2013 (31527) ArchRunner qrunner started.
>> Dec 31 06:25:01 2013 (8351) Master qrunner detected subprocess exit
>> (pid: 15347, sig: 9, sts: None, class: IncomingRunner, slice: 1/1) [restarting]
>> Dec 31 06:25:04 2013 (13794) IncomingRunner qrunner started.
>> Dec 31 12:28:51 2013 (8351) Master qrunner detected subprocess exit
>> (pid: 13794, sig: 9, sts: None, class: IncomingRunner, slice: 1/1) [restarting]
>> Dec 31 12:28:53 2013 (28877) IncomingRunner qrunner started.
>> Dec 31 18:32:44 2013 (8351) Master qrunner detected subprocess exit
>> (pid: 31527, sig: 9, sts: None, class: ArchRunner, slice: 1/1) [restarting]
>> Dec 31 18:32:46 2013 (10916) ArchRunner qrunner started.
>> Jan 01 00:36:02 2014 (8351) Master qrunner detected subprocess exit
>> (pid: 12268, sig: 9, sts: None, class: OutgoingRunner, slice: 1/1) [restarting]
>> Jan 01 00:36:04 2014 (25317) OutgoingRunner qrunner started.
>> Jan 01 12:43:48 2014 (8351) Master qrunner detected subprocess exit
>> (pid: 10916, sig: 9, sts: None, class: ArchRunner, slice: 1/1) [restarting]
>> Jan 01 12:43:50 2014 (22804) ArchRunner qrunner started.
>> Jan 01 15:22:22 2014 (8351) Master qrunner detected subprocess exit
>> (pid: 28877, sig: 9, sts: None, class: IncomingRunner, slice: 1/1) [restarting]
>> Jan 01 15:22:22 2014 (8351) Qrunner IncomingRunner reached maximum restart limit of 10, not restarting.
All of the above are signal 9 (SIGKILL). Do you have some cron or other
process that's SIGKILLing the qrunners in an attempt to keep them small
or for some other reason? See the FAQ at <http://wiki.list.org/x/94A9>.
> Also there are no errors in the error log around the same time. I am seeing a bunch of errors (now) like:
>
>> Jan 05 20:46:54 2014 (1522) Uncaught runner exception: [Errno 2] No such file or directory: '/usr/local/mailman/qfiles/in/1388971910.759851+b18e7af8cb0632a2d9f551c9e39053510b278e9a.pck'
>> Jan 05 20:46:54 2014 (1522) Traceback (most recent call last):
>> File "/usr/local/mailman/Mailman/Queue/Runner.py", line 99, in _oneloop
>> msg, msgdata = self._switchboard.dequeue(filebase)
>> File "/usr/local/mailman/Mailman/Queue/Switchboard.py", line 154, in dequeue
>> fp = open(filename)
>> IOError: [Errno 2] No such file or directory: '/usr/local/mailman/qfiles/in/1388971910.759851+b18e7af8cb0632a2d9f551c9e39053510b278e9a.pck'
>>
>> Jan 05 20:46:54 2014 (1522) Skipping and preserving unparseable message: 1388971910.759851+b18e7af8cb0632a2d9f551c9e39053510b278e9a
>> Jan 05 20:46:54 2014 (1522) Failed to unlink/preserve backup file: /usr/local/mailman/qfiles/in/1388971910.759851+b18e7af8cb0632a2d9f551c9e39053510b278e9a.bak
>> [Errno 2] No such file or directory
>
>
> I think these are related to some pck files that I hand deleted because I thought they were causing the stale lock problem.
I think these are because you have more than one qrunner processing the
same slice of the same queue. See the FAQ at <http://wiki.list.org/x/_4A9>.
--
Mark Sapiro <mark at msapiro.net> The highway is for gamblers,
San Francisco Bay Area, California better use your sense - B. Dylan
More information about the Mailman-Users
mailing list