Re: [Mailman-Users] log errors after restart

Con Wieland wrote:
Even in 2.1.9, these messages are removed when you see the "Ignoring unparseable message:" entry. If you continue to see these, they are probably malformed spam messages tha have nothing to do with the outage and restart.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

More issues. I had several of these messages:
Dec 24 06:57:23 2012 (17125) Ignoring unparseable message: 1354049125.0870931+f5e280e91f2ed298fb5 62d3690311d1e47d24dee
I found them in the bounce directory and removed them. I then restarted mailman with no errors but I noticed that I now have a lot of qfiles that don't seem to be getting processed.
/usr/local/mailman/qfiles/archive 153 /usr/local/mailman/qfiles/bad 542
I'm not sure what to do but it seems like something isn't right although it processes my test mail correctly.
Any ideas would be appreciated. con
I found them in the ~mailman/qfiles/bounces and removed them and restarted mailman. Now I get no more errors but I have On Dec 21, 2012, at 1:27 PM, Mark Sapiro wrote:

On 12/24/2012 7:48 AM, Con Wieland wrote:
So they were bounces that were unparseable for some reason. Possibly a spam post gets rejected or held and a notice is sent to the probably spoofed sender and that bounces. Except that the above message id is not a Mailman generated ID as would be found in a reject or held message notice. Also, you shouldn't find any queue entry corresponding the above log message, because that message says the queue entry was removed and ignored.
/usr/local/mailman/qfiles/archive 153
Is ArchRunner running? What are the extensions on the names of these 153 entries.
/usr/local/mailman/qfiles/bad 542
The 'bad' queue does not get processed. This is where unparseable messages are preserved for inspection (with a .psv extension) and where messages with no content after content filtering are saved if the list's filter_action = Preserve.
See the documention of settings QRUNNER_SAVE_BAD_MESSAGES, BAD_SHUNT_STALE_AFTER and BAD_SHUNT_ARCHIVE_DIRECTORY in the "Qrunner defaults" section of Defaults.py for more info, although if yours is an older Mailman, you may not have all these features/settings.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

On Dec 24, 2012, at 8:46 AM, Mark Sapiro wrote:
looks to be:
ps -ef |grep -i qrunner mailman 17786 17123 0 07:41:53 ? 0:00 /usr/local/bin/python /usr/local/mailman/bin/qrunner --runner=RetryRunner:0:1 - mailman 17785 17123 0 07:41:53 ? 0:00 /usr/local/bin/python /usr/local/mailman/bin/qrunner --runner=NewsRunner:0:1 -s mailman 17781 17123 0 07:41:52 ? 0:00 /usr/local/bin/python /usr/local/mailman/bin/qrunner --runner=BounceRunner:0:1 mailman 17783 17123 0 07:41:52 ? 0:05 /usr/local/bin/python /usr/local/mailman/bin/qrunner --runner=OutgoingRunner:0: mailman 17780 17123 0 07:41:52 ? 0:00 /usr/local/bin/python /usr/local/mailman/bin/qrunner --runner=CommandRunner:0:1 mailman 17782 17123 0 07:41:52 ? 0:04 /usr/local/bin/python /usr/local/mailman/bin/qrunner --runner=IncomingRunner:0: mailman 17784 17123 0 07:41:52 ? 0:03 /usr/local/bin/python /usr/local/mailman/bin/qrunner --runner=VirginRunner:0:1
the file extensions are .pck
thanks for looking con

On 12/24/2012 8:53 AM, Con Wieland wrote:
Huh? I don't see ArchRunner in the above list. Check Mailman's 'qrunner' and 'error' logs for clues as to why it's not.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

On Dec 24, 2012, at 8:58 AM, Mark Sapiro wrote:
Here's all I can find from the last restart, this is from the qrunner log nothing in the error log:
qrunner:Dec 24 07:41:51 2012 (17549) RetryRunner qrunner caught SIGINT. Stopping. qrunner:Dec 24 07:41:51 2012 (17549) RetryRunner qrunner exiting. qrunner:Dec 24 07:41:51 2012 (17550) NewsRunner qrunner caught SIGINT. Stopping. qrunner:Dec 24 07:41:51 2012 (17550) NewsRunner qrunner exiting. qrunner:Dec 24 07:41:51 2012 (17551) VirginRunner qrunner caught SIGINT. Stopping. qrunner:Dec 24 07:41:51 2012 (17551) VirginRunner qrunner exiting. qrunner:Dec 24 07:41:51 2012 (17552) OutgoingRunner qrunner caught SIGINT. Stopping. qrunner:Dec 24 07:41:51 2012 (17552) OutgoingRunner qrunner exiting. qrunner:Dec 24 07:41:51 2012 (17553) CommandRunner qrunner caught SIGINT. Stopping. qrunner:Dec 24 07:41:51 2012 (17553) CommandRunner qrunner exiting. qrunner:Dec 24 07:41:51 2012 (17554) IncomingRunner qrunner caught SIGINT. Stopping. qrunner:Dec 24 07:41:51 2012 (17554) IncomingRunner qrunner exiting. qrunner:Dec 24 07:41:51 2012 (17555) BounceRunner qrunner caught SIGINT. Stopping. qrunner:Dec 24 07:41:51 2012 (17555) BounceRunner qrunner exiting. qrunner:Dec 24 07:41:51 2012 (17123) Master qrunner detected subprocess exit qrunner:Dec 24 07:41:51 2012 (17123) Master qrunner detected subprocess exit qrunner:Dec 24 07:41:51 2012 (17123) Master qrunner detected subprocess exit qrunner:Dec 24 07:41:51 2012 (17123) Master qrunner detected subprocess exit qrunner:Dec 24 07:41:52 2012 (17123) Master qrunner detected subprocess exit qrunner:Dec 24 07:41:52 2012 (17123) Master qrunner detected subprocess exit qrunner:Dec 24 07:41:52 2012 (17780) CommandRunner qrunner started. qrunner:Dec 24 07:41:52 2012 (17123) Master qrunner detected subprocess exit qrunner:Dec 24 07:41:53 2012 (17781) BounceRunner qrunner started. qrunner:Dec 24 07:41:53 2012 (17782) IncomingRunner qrunner started. qrunner:Dec 24 07:41:53 2012 (17783) OutgoingRunner qrunner started. qrunner:Dec 24 07:41:53 2012 (17784) VirginRunner qrunner started. qrunner:Dec 24 07:41:53 2012 (17785) NewsRunner qrunner started. qrunner:Dec 24 07:41:53 2012 (17786) RetryRunner qrunner started.
So no problem removing these once I peruse them for possible causes, right?
/usr/local/mailman/qfiles/bad 542
con

so the other thing I just noticed, which I think is what your getting at is that
--runner=ArchRunner:0:1 -s
is not running and I notice it is on another install of mailman that I have. But I can't find any thing error wise other than the below. Is there a way to start it manually (maybe that would produce an error)
con
On Dec 24, 2012, at 8:58 AM, Mark Sapiro wrote:
Here's all I can find from the last restart, this is from the qrunner log nothing in the error log:
qrunner:Dec 24 07:41:51 2012 (17549) RetryRunner qrunner caught SIGINT. Stopping. qrunner:Dec 24 07:41:51 2012 (17549) RetryRunner qrunner exiting. qrunner:Dec 24 07:41:51 2012 (17550) NewsRunner qrunner caught SIGINT. Stopping. qrunner:Dec 24 07:41:51 2012 (17550) NewsRunner qrunner exiting. qrunner:Dec 24 07:41:51 2012 (17551) VirginRunner qrunner caught SIGINT. Stopping. qrunner:Dec 24 07:41:51 2012 (17551) VirginRunner qrunner exiting. qrunner:Dec 24 07:41:51 2012 (17552) OutgoingRunner qrunner caught SIGINT. Stopping. qrunner:Dec 24 07:41:51 2012 (17552) OutgoingRunner qrunner exiting. qrunner:Dec 24 07:41:51 2012 (17553) CommandRunner qrunner caught SIGINT. Stopping. qrunner:Dec 24 07:41:51 2012 (17553) CommandRunner qrunner exiting. qrunner:Dec 24 07:41:51 2012 (17554) IncomingRunner qrunner caught SIGINT. Stopping. qrunner:Dec 24 07:41:51 2012 (17554) IncomingRunner qrunner exiting. qrunner:Dec 24 07:41:51 2012 (17555) BounceRunner qrunner caught SIGINT. Stopping. qrunner:Dec 24 07:41:51 2012 (17555) BounceRunner qrunner exiting. qrunner:Dec 24 07:41:51 2012 (17123) Master qrunner detected subprocess exit qrunner:Dec 24 07:41:51 2012 (17123) Master qrunner detected subprocess exit qrunner:Dec 24 07:41:51 2012 (17123) Master qrunner detected subprocess exit qrunner:Dec 24 07:41:51 2012 (17123) Master qrunner detected subprocess exit qrunner:Dec 24 07:41:52 2012 (17123) Master qrunner detected subprocess exit qrunner:Dec 24 07:41:52 2012 (17123) Master qrunner detected subprocess exit qrunner:Dec 24 07:41:52 2012 (17780) CommandRunner qrunner started. qrunner:Dec 24 07:41:52 2012 (17123) Master qrunner detected subprocess exit qrunner:Dec 24 07:41:53 2012 (17781) BounceRunner qrunner started. qrunner:Dec 24 07:41:53 2012 (17782) IncomingRunner qrunner started. qrunner:Dec 24 07:41:53 2012 (17783) OutgoingRunner qrunner started. qrunner:Dec 24 07:41:53 2012 (17784) VirginRunner qrunner started. qrunner:Dec 24 07:41:53 2012 (17785) NewsRunner qrunner started. qrunner:Dec 24 07:41:53 2012 (17786) RetryRunner qrunner started.
So no problem removing these once I peruse them for possible causes, right?
/usr/local/mailman/qfiles/bad 542
con

On 12/24/2012 9:19 AM, Con Wieland wrote:
You can start it manually, but don't. Instead of 'restarting' Mailman which won't restart ArchRunner if ArchRunner has already exceeded it's died and autorestarted 10 times limit some time ago, 'stop' Mailman, make sure all the runners exit and then 'start' Mailman.
Also, don't remove entries from the archive queue before determining the issue with ArchRunner. These are messages to be archived and if you remove them, they won't be archived.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

Thanks Mark, that did the trick, ArchRunner started right up without any errors. Fingers crossed the only thing left are the 542 messages in qfiles/bad. They mostly seem to just be message headers that were bounces or the body was omitted.... Can they safely be deleted?
con
On Dec 24, 2012, at 9:28 AM, Mark Sapiro wrote:

On 12/24/2012 12:31 PM, Con Wieland wrote:
Thanks Mark, that did the trick, ArchRunner started right up without any errors. Fingers crossed the only thing left are the 542 messages in qfiles/bad. They mostly seem to just be message headers that were bounces or the body was omitted.... Can they safely be deleted?
In your case, the messages in the 'bad' queue are almost certainly all messages that had no content left after content filtering on lists for which filter_action = Preserve. Yes, you can just remove them.
Further, since no one is apparently pro-actively looking for these, I suggest setting
OWNERS_CAN_PRESERVE_FILTERED_MESSAGES = No
in mm_cfg.py so that these messages will not be moved to the 'bad' queue in the future.
You could also run something like:
#!/bin/bash
cd /path/to/mailman
for list in mmp/bin/list_lists --bare
; do
echo $list
mmp/bin/config_list -o - $list | grep "filter_action = 3"
done
to find those lists with filter_action = Preserve and then you can change them to one of the other settings.
As far as ArchRunner is concerned, if your logs go back far enough, you will find messages in the 'qrunner' log about ArchRunner dieing and restarting until its restart limit was reached and possibly corresponding 'error' log messages. This may have been due to corrupt files after the power outage hard crash, or it may be something that should be investigated.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

More issues. I had several of these messages:
Dec 24 06:57:23 2012 (17125) Ignoring unparseable message: 1354049125.0870931+f5e280e91f2ed298fb5 62d3690311d1e47d24dee
I found them in the bounce directory and removed them. I then restarted mailman with no errors but I noticed that I now have a lot of qfiles that don't seem to be getting processed.
/usr/local/mailman/qfiles/archive 153 /usr/local/mailman/qfiles/bad 542
I'm not sure what to do but it seems like something isn't right although it processes my test mail correctly.
Any ideas would be appreciated. con
I found them in the ~mailman/qfiles/bounces and removed them and restarted mailman. Now I get no more errors but I have On Dec 21, 2012, at 1:27 PM, Mark Sapiro wrote:

On 12/24/2012 7:48 AM, Con Wieland wrote:
So they were bounces that were unparseable for some reason. Possibly a spam post gets rejected or held and a notice is sent to the probably spoofed sender and that bounces. Except that the above message id is not a Mailman generated ID as would be found in a reject or held message notice. Also, you shouldn't find any queue entry corresponding the above log message, because that message says the queue entry was removed and ignored.
/usr/local/mailman/qfiles/archive 153
Is ArchRunner running? What are the extensions on the names of these 153 entries.
/usr/local/mailman/qfiles/bad 542
The 'bad' queue does not get processed. This is where unparseable messages are preserved for inspection (with a .psv extension) and where messages with no content after content filtering are saved if the list's filter_action = Preserve.
See the documention of settings QRUNNER_SAVE_BAD_MESSAGES, BAD_SHUNT_STALE_AFTER and BAD_SHUNT_ARCHIVE_DIRECTORY in the "Qrunner defaults" section of Defaults.py for more info, although if yours is an older Mailman, you may not have all these features/settings.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

On Dec 24, 2012, at 8:46 AM, Mark Sapiro wrote:
looks to be:
ps -ef |grep -i qrunner mailman 17786 17123 0 07:41:53 ? 0:00 /usr/local/bin/python /usr/local/mailman/bin/qrunner --runner=RetryRunner:0:1 - mailman 17785 17123 0 07:41:53 ? 0:00 /usr/local/bin/python /usr/local/mailman/bin/qrunner --runner=NewsRunner:0:1 -s mailman 17781 17123 0 07:41:52 ? 0:00 /usr/local/bin/python /usr/local/mailman/bin/qrunner --runner=BounceRunner:0:1 mailman 17783 17123 0 07:41:52 ? 0:05 /usr/local/bin/python /usr/local/mailman/bin/qrunner --runner=OutgoingRunner:0: mailman 17780 17123 0 07:41:52 ? 0:00 /usr/local/bin/python /usr/local/mailman/bin/qrunner --runner=CommandRunner:0:1 mailman 17782 17123 0 07:41:52 ? 0:04 /usr/local/bin/python /usr/local/mailman/bin/qrunner --runner=IncomingRunner:0: mailman 17784 17123 0 07:41:52 ? 0:03 /usr/local/bin/python /usr/local/mailman/bin/qrunner --runner=VirginRunner:0:1
the file extensions are .pck
thanks for looking con

On 12/24/2012 8:53 AM, Con Wieland wrote:
Huh? I don't see ArchRunner in the above list. Check Mailman's 'qrunner' and 'error' logs for clues as to why it's not.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

On Dec 24, 2012, at 8:58 AM, Mark Sapiro wrote:
Here's all I can find from the last restart, this is from the qrunner log nothing in the error log:
qrunner:Dec 24 07:41:51 2012 (17549) RetryRunner qrunner caught SIGINT. Stopping. qrunner:Dec 24 07:41:51 2012 (17549) RetryRunner qrunner exiting. qrunner:Dec 24 07:41:51 2012 (17550) NewsRunner qrunner caught SIGINT. Stopping. qrunner:Dec 24 07:41:51 2012 (17550) NewsRunner qrunner exiting. qrunner:Dec 24 07:41:51 2012 (17551) VirginRunner qrunner caught SIGINT. Stopping. qrunner:Dec 24 07:41:51 2012 (17551) VirginRunner qrunner exiting. qrunner:Dec 24 07:41:51 2012 (17552) OutgoingRunner qrunner caught SIGINT. Stopping. qrunner:Dec 24 07:41:51 2012 (17552) OutgoingRunner qrunner exiting. qrunner:Dec 24 07:41:51 2012 (17553) CommandRunner qrunner caught SIGINT. Stopping. qrunner:Dec 24 07:41:51 2012 (17553) CommandRunner qrunner exiting. qrunner:Dec 24 07:41:51 2012 (17554) IncomingRunner qrunner caught SIGINT. Stopping. qrunner:Dec 24 07:41:51 2012 (17554) IncomingRunner qrunner exiting. qrunner:Dec 24 07:41:51 2012 (17555) BounceRunner qrunner caught SIGINT. Stopping. qrunner:Dec 24 07:41:51 2012 (17555) BounceRunner qrunner exiting. qrunner:Dec 24 07:41:51 2012 (17123) Master qrunner detected subprocess exit qrunner:Dec 24 07:41:51 2012 (17123) Master qrunner detected subprocess exit qrunner:Dec 24 07:41:51 2012 (17123) Master qrunner detected subprocess exit qrunner:Dec 24 07:41:51 2012 (17123) Master qrunner detected subprocess exit qrunner:Dec 24 07:41:52 2012 (17123) Master qrunner detected subprocess exit qrunner:Dec 24 07:41:52 2012 (17123) Master qrunner detected subprocess exit qrunner:Dec 24 07:41:52 2012 (17780) CommandRunner qrunner started. qrunner:Dec 24 07:41:52 2012 (17123) Master qrunner detected subprocess exit qrunner:Dec 24 07:41:53 2012 (17781) BounceRunner qrunner started. qrunner:Dec 24 07:41:53 2012 (17782) IncomingRunner qrunner started. qrunner:Dec 24 07:41:53 2012 (17783) OutgoingRunner qrunner started. qrunner:Dec 24 07:41:53 2012 (17784) VirginRunner qrunner started. qrunner:Dec 24 07:41:53 2012 (17785) NewsRunner qrunner started. qrunner:Dec 24 07:41:53 2012 (17786) RetryRunner qrunner started.
So no problem removing these once I peruse them for possible causes, right?
/usr/local/mailman/qfiles/bad 542
con

so the other thing I just noticed, which I think is what your getting at is that
--runner=ArchRunner:0:1 -s
is not running and I notice it is on another install of mailman that I have. But I can't find any thing error wise other than the below. Is there a way to start it manually (maybe that would produce an error)
con
On Dec 24, 2012, at 8:58 AM, Mark Sapiro wrote:
Here's all I can find from the last restart, this is from the qrunner log nothing in the error log:
qrunner:Dec 24 07:41:51 2012 (17549) RetryRunner qrunner caught SIGINT. Stopping. qrunner:Dec 24 07:41:51 2012 (17549) RetryRunner qrunner exiting. qrunner:Dec 24 07:41:51 2012 (17550) NewsRunner qrunner caught SIGINT. Stopping. qrunner:Dec 24 07:41:51 2012 (17550) NewsRunner qrunner exiting. qrunner:Dec 24 07:41:51 2012 (17551) VirginRunner qrunner caught SIGINT. Stopping. qrunner:Dec 24 07:41:51 2012 (17551) VirginRunner qrunner exiting. qrunner:Dec 24 07:41:51 2012 (17552) OutgoingRunner qrunner caught SIGINT. Stopping. qrunner:Dec 24 07:41:51 2012 (17552) OutgoingRunner qrunner exiting. qrunner:Dec 24 07:41:51 2012 (17553) CommandRunner qrunner caught SIGINT. Stopping. qrunner:Dec 24 07:41:51 2012 (17553) CommandRunner qrunner exiting. qrunner:Dec 24 07:41:51 2012 (17554) IncomingRunner qrunner caught SIGINT. Stopping. qrunner:Dec 24 07:41:51 2012 (17554) IncomingRunner qrunner exiting. qrunner:Dec 24 07:41:51 2012 (17555) BounceRunner qrunner caught SIGINT. Stopping. qrunner:Dec 24 07:41:51 2012 (17555) BounceRunner qrunner exiting. qrunner:Dec 24 07:41:51 2012 (17123) Master qrunner detected subprocess exit qrunner:Dec 24 07:41:51 2012 (17123) Master qrunner detected subprocess exit qrunner:Dec 24 07:41:51 2012 (17123) Master qrunner detected subprocess exit qrunner:Dec 24 07:41:51 2012 (17123) Master qrunner detected subprocess exit qrunner:Dec 24 07:41:52 2012 (17123) Master qrunner detected subprocess exit qrunner:Dec 24 07:41:52 2012 (17123) Master qrunner detected subprocess exit qrunner:Dec 24 07:41:52 2012 (17780) CommandRunner qrunner started. qrunner:Dec 24 07:41:52 2012 (17123) Master qrunner detected subprocess exit qrunner:Dec 24 07:41:53 2012 (17781) BounceRunner qrunner started. qrunner:Dec 24 07:41:53 2012 (17782) IncomingRunner qrunner started. qrunner:Dec 24 07:41:53 2012 (17783) OutgoingRunner qrunner started. qrunner:Dec 24 07:41:53 2012 (17784) VirginRunner qrunner started. qrunner:Dec 24 07:41:53 2012 (17785) NewsRunner qrunner started. qrunner:Dec 24 07:41:53 2012 (17786) RetryRunner qrunner started.
So no problem removing these once I peruse them for possible causes, right?
/usr/local/mailman/qfiles/bad 542
con

On 12/24/2012 9:19 AM, Con Wieland wrote:
You can start it manually, but don't. Instead of 'restarting' Mailman which won't restart ArchRunner if ArchRunner has already exceeded it's died and autorestarted 10 times limit some time ago, 'stop' Mailman, make sure all the runners exit and then 'start' Mailman.
Also, don't remove entries from the archive queue before determining the issue with ArchRunner. These are messages to be archived and if you remove them, they won't be archived.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

Thanks Mark, that did the trick, ArchRunner started right up without any errors. Fingers crossed the only thing left are the 542 messages in qfiles/bad. They mostly seem to just be message headers that were bounces or the body was omitted.... Can they safely be deleted?
con
On Dec 24, 2012, at 9:28 AM, Mark Sapiro wrote:

On 12/24/2012 12:31 PM, Con Wieland wrote:
Thanks Mark, that did the trick, ArchRunner started right up without any errors. Fingers crossed the only thing left are the 542 messages in qfiles/bad. They mostly seem to just be message headers that were bounces or the body was omitted.... Can they safely be deleted?
In your case, the messages in the 'bad' queue are almost certainly all messages that had no content left after content filtering on lists for which filter_action = Preserve. Yes, you can just remove them.
Further, since no one is apparently pro-actively looking for these, I suggest setting
OWNERS_CAN_PRESERVE_FILTERED_MESSAGES = No
in mm_cfg.py so that these messages will not be moved to the 'bad' queue in the future.
You could also run something like:
#!/bin/bash
cd /path/to/mailman
for list in mmp/bin/list_lists --bare
; do
echo $list
mmp/bin/config_list -o - $list | grep "filter_action = 3"
done
to find those lists with filter_action = Preserve and then you can change them to one of the other settings.
As far as ArchRunner is concerned, if your logs go back far enough, you will find messages in the 'qrunner' log about ArchRunner dieing and restarting until its restart limit was reached and possibly corresponding 'error' log messages. This may have been due to corrupt files after the power outage hard crash, or it may be something that should be investigated.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
participants (2)
-
Con Wieland
-
Mark Sapiro