Re: [Mailman-Users] Mail stuck in qfiles/in
I think I solved my own problem. Instead of just restarting qrunner, I did a stop and then a start. That did it.
The clue was when I ran ps auxww | egrep 'p[y]thon' as FAQ 3.14 suggests, instead of there being eight processes there, there were only two.
So I'm back in business. Any follow-up thoughts on why this happened?
Anyway, thanks!
Allan
Allan Trick wrote:
So I'm back in business. Any follow-up thoughts on why this happened?
Look at Mailman's error and qrunner logs from the time that the lists stopped for clues as to why the other seven qrunners died (or why they all died and only RetryRunner was restarted).
-- Mark Sapiro <msapiro@value.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
At 11:15 AM 2/2/2007, Mark Sapiro wrote:
Look at Mailman's error and qrunner logs from the time that the lists stopped for clues as to why the other seven qrunners died (or why they all died and only RetryRunner was restarted).
There's nothing in the error log. But qrunner's might have a clue. I'm not sure how to read this:
Jan 31 11:38:51 2007 (1541) Master qrunner detected subprocess exit (pid: 29673, sig: None, sts: 1, class: OutgoingRunner, slice: 1/1) [restarting] Jan 31 11:38:51 2007 (1541) Master qrunner detected subprocess exit (pid: 29273, sig: None, sts: 1, class: IncomingRunner, slice: 1/1) [restarting] Jan 31 11:38:51 2007 (1541) Master qrunner detected subprocess exit (pid: 29276, sig: None, sts: 1, class: ArchRunner, slice: 1/1) [restarting] Jan 31 11:38:51 2007 (1541) Master qrunner detected subprocess exit (pid: 1548, sig: None, sts: 1, class: VirginRunner, slice: 1/1) [restarting] Jan 31 11:38:51 2007 (1541) Master qrunner detected subprocess exit (pid: 24311, sig: None, sts: 1, class: BounceRunner, slice: 1/1) [restarting] Jan 31 11:38:51 2007 (1541) Master qrunner detected subprocess exit (pid: 1546, sig: None, sts: 1, class: NewsRunner, slice: 1/1) [restarting] Jan 31 11:38:51 2007 (1541) Master qrunner detected subprocess exit (pid: 1544, sig: None, sts: 1, class: CommandRunner, slice: 1/1) [restarting] Jan 31 11:38:51 2007 (18640) IncomingRunner qrunner started. Jan 31 11:38:51 2007 (18643) BounceRunner qrunner started. Jan 31 11:38:51 2007 (18642) VirginRunner qrunner started. Jan 31 11:38:51 2007 (18641) ArchRunner qrunner started. Jan 31 11:38:51 2007 (18644) NewsRunner qrunner started. Jan 31 11:38:51 2007 (18639) OutgoingRunner qrunner started. Jan 31 11:38:52 2007 (18645) CommandRunner qrunner started. Jan 31 12:00:36 2007 (1541) Master qrunner detected subprocess exit (pid: 18639, sig: None, sts: 1, class: OutgoingRunner, slice: 1/1) [restarting] Jan 31 12:00:37 2007 (1541) Master qrunner detected subprocess exit (pid: 26785, sig: None, sts: 1, class: OutgoingRunner, slice: 1/1) [restarting] Jan 31 12:00:37 2007 (1541) Master qrunner detected subprocess exit (pid: 26786, sig: None, sts: 1, class: OutgoingRunner, slice: 1/1) [restarting] Jan 31 12:00:37 2007 (26787) OutgoingRunner qrunner started. Jan 31 12:00:37 2007 (1541) Master qrunner detected subprocess exit (pid: 18645, sig: None, sts: 1, class: CommandRunner, slice: 1/1) [restarting] Jan 31 12:00:37 2007 (1541) Master qrunner detected subprocess exit (pid: 18640, sig: None, sts: 1, class: IncomingRunner, slice: 1/1) [restarting] Jan 31 12:00:37 2007 (1541) Master qrunner detected subprocess exit (pid: 18641, sig: None, sts: 1, class: ArchRunner, slice: 1/1) [restarting] Jan 31 12:00:37 2007 (1541) Master qrunner detected subprocess exit (pid: 18642, sig: None, sts: 1, class: VirginRunner, slice: 1/1) [restarting] Jan 31 12:00:37 2007 (1541) Master qrunner detected subprocess exit (pid: 18643, sig: None, sts: 1, class: BounceRunner, slice: 1/1) [restarting] Jan 31 12:00:38 2007 (1541) Master qrunner detected subprocess exit (pid: 26792, sig: None, sts: 1, class: CommandRunner, slice: 1/1) [restarting] Jan 31 12:00:38 2007 (26793) IncomingRunner qrunner started. Jan 31 12:00:38 2007 (26795) VirginRunner qrunner started. Jan 31 12:00:38 2007 (26794) ArchRunner qrunner started. Jan 31 12:00:38 2007 (1541) Master qrunner detected subprocess exit (pid: 26797, sig: None, sts: 1, class: CommandRunner, slice: 1/1) [restarting] Jan 31 12:00:38 2007 (1541) Master qrunner detected subprocess exit (pid: 26796, sig: None, sts: 1, class: BounceRunner, slice: 1/1) [restarting] Jan 31 12:00:38 2007 (1541) Master qrunner detected subprocess exit (pid: 26801, sig: None, sts: 1, class: BounceRunner, slice: 1/1) [restarting] Jan 31 12:00:38 2007 (26798) CommandRunner qrunner started. Jan 31 12:00:38 2007 (1541) Master qrunner detected subprocess exit (pid: 26803, sig: None, sts: 1, class: BounceRunner, slice: 1/1) [restarting] Jan 31 12:00:38 2007 (1541) Master qrunner detected subprocess exit (pid: 26804, sig: None, sts: 1, class: BounceRunner, slice: 1/1) [restarting] Jan 31 12:00:38 2007 (1541) Master qrunner detected subprocess exit (pid: 26805, sig: None, sts: 1, class: BounceRunner, slice: 1/1) [restarting] Jan 31 12:00:41 2007 (1541) Master qrunner detected subprocess exit (pid: 26806, sig: None, sts: 1, class: BounceRunner, slice: 1/1) [restarting] Jan 31 12:00:41 2007 (1541) Master qrunner detected subprocess exit (pid: 26811, sig: None, sts: 127, class: BounceRunner, slice: 1/1) [restarting] Jan 31 12:00:41 2007 (1541) Master qrunner detected subprocess exit (pid: 26806, sig: None, sts: 1, class: BounceRunner, slice: 1/1) [restarting] Jan 31 12:00:41 2007 (1541) Master qrunner detected subprocess exit (pid: 26811, sig: None, sts: 127, class: BounceRunner, slice: 1/1) [restarting] Jan 31 12:00:42 2007 (1541) Qrunner BounceRunner reached maximum restart limit of 10, not restarting. Jan 31 12:00:45 2007 (1541) Master qrunner detected subprocess exit (pid: 26793, sig: None, sts: 1, class: IncomingRunner, slice: 1/1) [restarting] Jan 31 12:00:45 2007 (1541) Master qrunner detected subprocess exit (pid: 26794, sig: None, sts: 1, class: ArchRunner, slice: 1/1) [restarting] Jan 31 12:00:45 2007 (1541) Master qrunner detected subprocess exit (pid: 26787, sig: None, sts: 1, class: OutgoingRunner, slice: 1/1) [restarting] Jan 31 12:00:45 2007 (1541) Master qrunner detected subprocess exit (pid: 26795, sig: None, sts: 1, class: VirginRunner, slice: 1/1) [restarting] Jan 31 12:00:45 2007 (1541) Master qrunner detected subprocess exit (pid: 26798, sig: None, sts: 1, class: CommandRunner, slice: 1/1) [restarting] Jan 31 12:00:46 2007 (26843) IncomingRunner qrunner started. Jan 31 12:00:46 2007 (26845) OutgoingRunner qrunner started. Jan 31 12:00:46 2007 (26844) ArchRunner qrunner started. Jan 31 12:00:46 2007 (1541) Master qrunner detected subprocess exit (pid: 26846, sig: None, sts: 1, class: VirginRunner, slice: 1/1) [restarting] Jan 31 12:00:46 2007 (26847) CommandRunner qrunner started. Jan 31 12:00:46 2007 (26848) VirginRunner qrunner started. Jan 31 12:01:08 2007 (1541) Master qrunner detected subprocess exit (pid: 26843, sig: None, sts: 1, class: IncomingRunner, slice: 1/1) [restarting] Jan 31 12:01:08 2007 (1541) Master qrunner detected subprocess exit (pid: 18644, sig: None, sts: 1, class: NewsRunner, slice: 1/1) [restarting] Jan 31 12:01:08 2007 (1541) Master qrunner detected subprocess exit (pid: 26847, sig: None, sts: 1, class: CommandRunner, slice: 1/1) [restarting] Jan 31 12:01:08 2007 (1541) Master qrunner detected subprocess exit (pid: 26844, sig: None, sts: 1, class: ArchRunner, slice: 1/1) [restarting] Jan 31 12:01:08 2007 (1541) Master qrunner detected subprocess exit (pid: 26958, sig: None, sts: 1, class: ArchRunner, slice: 1/1) [restarting] Jan 31 12:01:08 2007 (26955) IncomingRunner qrunner started. Jan 31 12:01:08 2007 (26956) NewsRunner qrunner started. Jan 31 12:01:08 2007 (1541) Master qrunner detected subprocess exit (pid: 26957, sig: None, sts: 1, class: CommandRunner, slice: 1/1) [restarting] Jan 31 12:01:08 2007 (26959) ArchRunner qrunner started. Jan 31 12:01:08 2007 (26962) CommandRunner qrunner started. Jan 31 12:01:09 2007 (1541) Master qrunner detected subprocess exit (pid: 26955, sig: None, sts: 1, class: IncomingRunner, slice: 1/1) [restarting] Jan 31 12:01:09 2007 (1541) Master qrunner detected subprocess exit (pid: 26969, sig: None, sts: 1, class: IncomingRunner, slice: 1/1) [restarting] Jan 31 12:01:09 2007 (1541) Master qrunner detected subprocess exit (pid: 26970, sig: None, sts: 1, class: IncomingRunner, slice: 1/1) [restarting] Jan 31 12:01:09 2007 (1541) Master qrunner detected subprocess exit (pid: 26971, sig: None, sts: 1, class: IncomingRunner, slice: 1/1) [restarting] Jan 31 12:01:10 2007 (1541) Master qrunner detected subprocess exit (pid: 26972, sig: None, sts: 1, class: IncomingRunner, slice: 1/1) [restarting] Jan 31 12:01:10 2007 (1541) Qrunner IncomingRunner reached maximum restart limit of 10, not restarting. Jan 31 12:01:10 2007 (1541) Master qrunner detected subprocess exit (pid: 26848, sig: None, sts: 1, class: VirginRunner, slice: 1/1) [restarting]
......
And it goes on and on like that. Why would it not have been able to restart??
Thanks for any interpretation!
Allan
Allan Trick wrote:
There's nothing in the error log. But qrunner's might have a clue. I'm not sure how to read this:
Jan 31 11:38:51 2007 (1541) Master qrunner detected subprocess exit (pid: 29673, sig: None, sts: 1, class: OutgoingRunner, slice: 1/1) [restarting]
This says OutgoingRunner quit with exit status 1 with no signal. This in itself is not too informative, but ...
Jan 31 11:38:51 2007 (1541) Master qrunner detected subprocess exit (pid: 29273, sig: None, sts: 1, class: IncomingRunner, slice: 1/1) [restarting] Jan 31 11:38:51 2007 (1541) Master qrunner detected subprocess exit (pid: 29276, sig: None, sts: 1, class: ArchRunner, slice: 1/1) [restarting] Jan 31 11:38:51 2007 (1541) Master qrunner detected subprocess exit (pid: 1548, sig: None, sts: 1, class: VirginRunner, slice: 1/1) [restarting] Jan 31 11:38:51 2007 (1541) Master qrunner detected subprocess exit (pid: 24311, sig: None, sts: 1, class: BounceRunner, slice: 1/1) [restarting] Jan 31 11:38:51 2007 (1541) Master qrunner detected subprocess exit (pid: 1546, sig: None, sts: 1, class: NewsRunner, slice: 1/1) [restarting] Jan 31 11:38:51 2007 (1541) Master qrunner detected subprocess exit (pid: 1544, sig: None, sts: 1, class: CommandRunner, slice: 1/1) [restarting] Jan 31 11:38:51 2007 (18640) IncomingRunner qrunner started. Jan 31 11:38:51 2007 (18643) BounceRunner qrunner started. Jan 31 11:38:51 2007 (18642) VirginRunner qrunner started. Jan 31 11:38:51 2007 (18641) ArchRunner qrunner started. Jan 31 11:38:51 2007 (18644) NewsRunner qrunner started. Jan 31 11:38:51 2007 (18639) OutgoingRunner qrunner started. Jan 31 11:38:52 2007 (18645) CommandRunner qrunner started.
At 11:38:51+, every runner except RetryRunner quit and was restarted. Then all seemed OK for about 22 minutes.
Jan 31 12:00:36 2007 (1541) Master qrunner detected subprocess exit (pid: 18639, sig: None, sts: 1, class: OutgoingRunner, slice: 1/1) [restarting] Jan 31 12:00:37 2007 (1541) Master qrunner detected subprocess exit (pid: 26785, sig: None, sts: 1, class: OutgoingRunner, slice: 1/1) [restarting] Jan 31 12:00:37 2007 (1541) Master qrunner detected subprocess exit (pid: 26786, sig: None, sts: 1, class: OutgoingRunner, slice: 1/1) [restarting]
Then 3 Outgoing runners died. The one that was started 22 minutes ago (pid 18639) and two others (pids 26785 and 26786). Perhaps 26785 died before logging its 'started' message and 26786 was started and did the same thing.
Jan 31 12:00:37 2007 (26787) OutgoingRunner qrunner started. Jan 31 12:00:37 2007 (1541) Master qrunner detected subprocess exit (pid: 18645, sig: None, sts: 1, class: CommandRunner, slice: 1/1) [restarting] Jan 31 12:00:37 2007 (1541) Master qrunner detected subprocess exit (pid: 18640, sig: None, sts: 1, class: IncomingRunner, slice: 1/1) [restarting] Jan 31 12:00:37 2007 (1541) Master qrunner detected subprocess exit (pid: 18641, sig: None, sts: 1, class: ArchRunner, slice: 1/1) [restarting] Jan 31 12:00:37 2007 (1541) Master qrunner detected subprocess exit (pid: 18642, sig: None, sts: 1, class: VirginRunner, slice: 1/1) [restarting] Jan 31 12:00:37 2007 (1541) Master qrunner detected subprocess exit (pid: 18643, sig: None, sts: 1, class: BounceRunner, slice: 1/1) [restarting] Jan 31 12:00:38 2007 (1541) Master qrunner detected subprocess exit (pid: 26792, sig: None, sts: 1, class: CommandRunner, slice: 1/1) [restarting] Jan 31 12:00:38 2007 (26793) IncomingRunner qrunner started. Jan 31 12:00:38 2007 (26795) VirginRunner qrunner started. Jan 31 12:00:38 2007 (26794) ArchRunner qrunner started. Jan 31 12:00:38 2007 (1541) Master qrunner detected subprocess exit (pid: 26797, sig: None, sts: 1, class: CommandRunner, slice: 1/1) [restarting] Jan 31 12:00:38 2007 (1541) Master qrunner detected subprocess exit (pid: 26796, sig: None, sts: 1, class: BounceRunner, slice: 1/1) [restarting] Jan 31 12:00:38 2007 (1541) Master qrunner detected subprocess exit (pid: 26801, sig: None, sts: 1, class: BounceRunner, slice: 1/1) [restarting] Jan 31 12:00:38 2007 (26798) CommandRunner qrunner started. Jan 31 12:00:38 2007 (1541) Master qrunner detected subprocess exit (pid: 26803, sig: None, sts: 1, class: BounceRunner, slice: 1/1) [restarting] Jan 31 12:00:38 2007 (1541) Master qrunner detected subprocess exit (pid: 26804, sig: None, sts: 1, class: BounceRunner, slice: 1/1) [restarting] Jan 31 12:00:38 2007 (1541) Master qrunner detected subprocess exit (pid: 26805, sig: None, sts: 1, class: BounceRunner, slice: 1/1) [restarting] Jan 31 12:00:41 2007 (1541) Master qrunner detected subprocess exit (pid: 26806, sig: None, sts: 1, class: BounceRunner, slice: 1/1) [restarting] Jan 31 12:00:41 2007 (1541) Master qrunner detected subprocess exit (pid: 26811, sig: None, sts: 127, class: BounceRunner, slice: 1/1) [restarting] Jan 31 12:00:41 2007 (1541) Master qrunner detected subprocess exit (pid: 26806, sig: None, sts: 1, class: BounceRunner, slice: 1/1) [restarting] Jan 31 12:00:41 2007 (1541) Master qrunner detected subprocess exit (pid: 26811, sig: None, sts: 127, class: BounceRunner, slice: 1/1) [restarting] Jan 31 12:00:42 2007 (1541) Qrunner BounceRunner reached maximum restart limit of 10, not restarting. Jan 31 12:00:45 2007 (1541) Master qrunner detected subprocess exit (pid: 26793, sig: None, sts: 1, class: IncomingRunner, slice: 1/1) [restarting] Jan 31 12:00:45 2007 (1541) Master qrunner detected subprocess exit (pid: 26794, sig: None, sts: 1, class: ArchRunner, slice: 1/1) [restarting] Jan 31 12:00:45 2007 (1541) Master qrunner detected subprocess exit (pid: 26787, sig: None, sts: 1, class: OutgoingRunner, slice: 1/1) [restarting] Jan 31 12:00:45 2007 (1541) Master qrunner detected subprocess exit (pid: 26795, sig: None, sts: 1, class: VirginRunner, slice: 1/1) [restarting] Jan 31 12:00:45 2007 (1541) Master qrunner detected subprocess exit (pid: 26798, sig: None, sts: 1, class: CommandRunner, slice: 1/1) [restarting] Jan 31 12:00:46 2007 (26843) IncomingRunner qrunner started. Jan 31 12:00:46 2007 (26845) OutgoingRunner qrunner started. Jan 31 12:00:46 2007 (26844) ArchRunner qrunner started. Jan 31 12:00:46 2007 (1541) Master qrunner detected subprocess exit (pid: 26846, sig: None, sts: 1, class: VirginRunner, slice: 1/1) [restarting] Jan 31 12:00:46 2007 (26847) CommandRunner qrunner started. Jan 31 12:00:46 2007 (26848) VirginRunner qrunner started. Jan 31 12:01:08 2007 (1541) Master qrunner detected subprocess exit (pid: 26843, sig: None, sts: 1, class: IncomingRunner, slice: 1/1) [restarting] Jan 31 12:01:08 2007 (1541) Master qrunner detected subprocess exit (pid: 18644, sig: None, sts: 1, class: NewsRunner, slice: 1/1) [restarting] Jan 31 12:01:08 2007 (1541) Master qrunner detected subprocess exit (pid: 26847, sig: None, sts: 1, class: CommandRunner, slice: 1/1) [restarting] Jan 31 12:01:08 2007 (1541) Master qrunner detected subprocess exit (pid: 26844, sig: None, sts: 1, class: ArchRunner, slice: 1/1) [restarting] Jan 31 12:01:08 2007 (1541) Master qrunner detected subprocess exit (pid: 26958, sig: None, sts: 1, class: ArchRunner, slice: 1/1) [restarting] Jan 31 12:01:08 2007 (26955) IncomingRunner qrunner started. Jan 31 12:01:08 2007 (26956) NewsRunner qrunner started. Jan 31 12:01:08 2007 (1541) Master qrunner detected subprocess exit (pid: 26957, sig: None, sts: 1, class: CommandRunner, slice: 1/1) [restarting] Jan 31 12:01:08 2007 (26959) ArchRunner qrunner started. Jan 31 12:01:08 2007 (26962) CommandRunner qrunner started. Jan 31 12:01:09 2007 (1541) Master qrunner detected subprocess exit (pid: 26955, sig: None, sts: 1, class: IncomingRunner, slice: 1/1) [restarting] Jan 31 12:01:09 2007 (1541) Master qrunner detected subprocess exit (pid: 26969, sig: None, sts: 1, class: IncomingRunner, slice: 1/1) [restarting] Jan 31 12:01:09 2007 (1541) Master qrunner detected subprocess exit (pid: 26970, sig: None, sts: 1, class: IncomingRunner, slice: 1/1) [restarting] Jan 31 12:01:09 2007 (1541) Master qrunner detected subprocess exit (pid: 26971, sig: None, sts: 1, class: IncomingRunner, slice: 1/1) [restarting] Jan 31 12:01:10 2007 (1541) Master qrunner detected subprocess exit (pid: 26972, sig: None, sts: 1, class: IncomingRunner, slice: 1/1) [restarting] Jan 31 12:01:10 2007 (1541) Qrunner IncomingRunner reached maximum restart limit of 10, not restarting. Jan 31 12:01:10 2007 (1541) Master qrunner detected subprocess exit (pid: 26848, sig: None, sts: 1, class: VirginRunner, slice: 1/1) [restarting]
......
And it goes on and on like that. Why would it not have been able to restart??
The fact that beginning at 12:00:37, the runners are dying as fast as they can be restarted, in some cases it seems before even logging their 'started' message which they do before actually beginning to process their queues, seems to point to some external OS condition as the cause. It is curious that RetryRunner seems to be exempt.
Other than thinking it probably isn't an internal Mailman thing, but rather an external OS thing, I don't have any ideas.
-- Mark Sapiro <msapiro@value.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
participants (2)
-
Allan Trick
-
Mark Sapiro