
I have a list (several lists, actually) running on Mailman 2.1.11 and it looks as if bounce processing is broken. On the list in question, the following are set:
bounce_processing = Yes bounce_score_threshold = 1.0 bounce_info_stale_after = 1 bounce_you_are_disabled_warnings = 0 bounce_you_are_disabled_warnings_interval = 7
/var/lib/mailman/logs/bounce shows many entry lines of this form, in sets of 3 as shown below:
Aug 11 12:35:24 2009 (19017) listname: user@hotmail.com bounce score: 1.0 Aug 11 12:35:24 2009 (19017) listname: user@hotmail.com disabling due to bounce score 1.0 >= 1.0 Aug 11 12:35:24 2009 (19017) listname: user@hotmail.com deleted after exhausting notices
However, looking at the subscription roster or grepping for user@hotmail.com using list_members indicates that the user is still subscribed, with no nomail flag set, and no notice is sent to the list owner.
I'm running mailman on Gentoo Linux which uses:
PREFIX = '/usr/lib64/mailman' VAR_PREFIX = '/var/lib/mailman'
Does anyone have any idea how to troubleshoot this?
-- Lindsay Haisley | SUPPORT NETWORK NEUTRALITY FMP Computer Services | -------------------------- 512-259-1190 | Boycott Yahoo, RoadRunner, AOL http://www.fmp.com | and Verison

I restarted (twice) the qrunner suite of processes from the system command line using the system init scripts (/etc/init.d/mailman) with two noticeable results.
First, an egregious number of "Bounce action notifications" and "list unsubscribe notifications" went out on bounces for lists on which I'm listed as an owner, including the one that brought this problem to my attention. Some notifications date back a couple of months so this is apparently a problem of some duration.
Second, many subscribers to the problem list received multiple copies of the most recently queued post. Could this be because I stopped and restarted the qrunners several times? Why would this cause multiple copies to be sent?
I should also note that the bouncing subscribers were _still_ not unsubscribed, nor was the nomail flag set for those for whom a soft bounce was received.
All qrunner processes were (and are still) running, or at least according to the process table. Can these processes crash? If so, what can I do to prevent this? If I need to restart the qrunners, how do I avoid causing multiple copies of posts to be sent out?
On Tue, 2009-08-11 at 13:02 -0500, Lindsay Haisley wrote:
-- Lindsay Haisley | "Everything works if you let it" FMP Computer Services | (The Roadie) 512-259-1190 | http://www.fmp.com |

I restarted (twice) the qrunner suite of processes from the system command line using the system init scripts (/etc/init.d/mailman) with two noticeable results.
First, an egregious number of "Bounce action notifications" and "list unsubscribe notifications" went out on bounces for lists on which I'm listed as an owner, including the one that brought this problem to my attention. Some notifications date back a couple of months so this is apparently a problem of some duration.
Second, many subscribers to the problem list received multiple copies of the most recently queued post. Could this be because I stopped and restarted the qrunners several times? Why would this cause multiple copies to be sent?
I should also note that the bouncing subscribers were _still_ not unsubscribed, nor was the nomail flag set for those for whom a soft bounce was received.
All qrunner processes were (and are still) running, or at least according to the process table. Can these processes crash, or go zombie? If so, what can I do to prevent this? If I need to restart the qrunners, how do I avoid causing multiple copies of posts to be sent out?
On Tue, 2009-08-11 at 13:02 -0500, Lindsay Haisley wrote:
--
Lindsay Haisley | "Never expect the people who caused a problem
FMP Computer Services | to solve it." - Albert Einstein
512-259-1190 |
http://www.fmp.com |

Lindsay Haisley wrote:
I would have to see the /etc/init.d/mailman script to know for sure, but I'm guessing there is something in it that recovers old, stale bounce-events-ppppp.pck files. These files were left behind with the offending bounces when the 2.1.11 bug threw the exception that caused BounceRunner to die without saving the updated list with the bouncing member removed.
Note that this bug, addressed in my earlier reply, only occurs when bounce_you_are_disabled_warnings = 0.
Yes, it could be. You stopped Mailman which signalled OutgoingRunner to stop in the middle of delivering the post. If somehow OutgoingRunner was SIGKILL'd, it would have stopped mid-delivery and when mailman restarted, the backup out queue entry was recovered and the post was delivered to all list members, some of whom had been delivered before. However this is not what normally happens. It is supposed to be SIGTERM'd and finish it's current delivery. Perhaps there's something in the init.d script that will SIGKILL it if it doesn't stop soon enough, or perhaps Mailman was restarted before OutgoingRunner finished and the new OutgoingRunner 'recovered' the old runner's backup queue entry, but this would result in everyone receiving a duplicate unless something outbound of Mailman dropped the duplicate message.
This is the 2.1.11 bug addressed in my earlier reply.
Yes, qrunners can die. Just look at Mailman's qrunner and error logs. Normally, when a qrunner dies, it is automatically restarted by mailmanctl up to 10 restarts.
Duplicates are a pain, and every effort is taken to avoid or minimize them, but if a runner dies, due to an uncaught exception, the message is normally shunted and requires manual action to reprocess, and even this normally doesn't result in duplicates.
Duplicates can occur when a runner is killed asynchronously by a system crash, power failure or perhaps in your case, by your init.d script, but normally, a simple "mailmanctl stop|restart" should just signal the runners, and they shouldn't stop until finished with the current task.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

Mark, thanks for your knowledgeable and _very_ helpful post!
On Wed, 2009-08-12 at 09:41 -0700, Mark Sapiro wrote:
The Gentoo init script for mailman is pretty simple. It executes, as user 'mailman', "mailmanctl -s start", "mailmanctl stop" and "mailmanctl restart" for the standard init script arguments of start, stop and restart. That's all.
Note that this bug, addressed in my earlier reply, only occurs when bounce_you_are_disabled_warnings = 0.
I found a thread on the Gentoo bug reporting list which discusses compatibility issues between Mailman 2.1.11 and Python 2.6, also possibly 2.5 (which I'm running on these boxes). Gentoo is distributing mm 2.1.11 with stable as of yesterday, and 2.1.12 with unstable, but they're apparently pushing to stabilize 2.1.12 ahead of schedule since Python 2.6 is now stable in the distribution. I expect this to happen
I installed Mailman 2.1.12 from Gentoo unstable and at least the problem with non-removal of bouncing addresses seems to have gone away. Perhaps the qrunner processes will also be more stable.
Apparently something strange went down, since all the init.d script does is execute mailmanctl, as noted above.
-- Lindsay Haisley | "The difference between a duck is because FMP Computer Services | one leg is both the same" 512-259-1190 | - Anonymous http://www.fmp.com |

On Wed, 2009-08-12 at 13:02 -0500, Lindsay Haisley wrote:
As of today, MM 2.1.12 is in Gentoo stable.
-- Lindsay Haisley | "In an open world, | PGP public key FMP Computer Services | who needs Windows | available at 512-259-1190 | or Gates" | http://pubkeys.fmp.com http://www.fmp.com | |

On Wed, 2009-08-12 at 13:30 -0500, Lindsay Haisley wrote:
As of today, MM 2.1.12 is in Gentoo stable.
I mis-spoke. Apparently this isn't yet the case, although I would expect it to be so within a week or so.
Sorry ....
-- Lindsay Haisley | "Everything works if you let it" FMP Computer Services | (The Roadie) 512-259-1190 | http://www.fmp.com |

On 8/12/2009 2:02 PM, Lindsay Haisley wrote:
Mine stopped working and no amount of begging on the gentoo forums resulted in any fixes. What finally fixed it for me was to add the full path to the mailmanctl command being issued.
http://forums.gentoo.org/viewtopic-t-641573-postdays-0-postorder-asc-highlig...
Thanks for the heads up about the compatibility issues, guess I'll wait a while before updating, but...
Hmmm... I just synced, and it still shows 2.1.9-r3 as current stable.
and 2.1.12 with unstable,
Confirmed.
--
Best regards,
Charles

Lindsay Haisley <fmouse-mailman@fmp.com> Date: Tue, 11 Aug 2009 13:02:55 -0500 To: mailman-users@python.org Cc: Slim Richey <slim@ridgerunner.com>
This is a bug introduced in 2.1.11 and fixed in 2.1.12.
You should be seeing errors in Mailman's error log too.
The attached Bouncer_patch.txt will fix it.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

I restarted (twice) the qrunner suite of processes from the system command line using the system init scripts (/etc/init.d/mailman) with two noticeable results.
First, an egregious number of "Bounce action notifications" and "list unsubscribe notifications" went out on bounces for lists on which I'm listed as an owner, including the one that brought this problem to my attention. Some notifications date back a couple of months so this is apparently a problem of some duration.
Second, many subscribers to the problem list received multiple copies of the most recently queued post. Could this be because I stopped and restarted the qrunners several times? Why would this cause multiple copies to be sent?
I should also note that the bouncing subscribers were _still_ not unsubscribed, nor was the nomail flag set for those for whom a soft bounce was received.
All qrunner processes were (and are still) running, or at least according to the process table. Can these processes crash? If so, what can I do to prevent this? If I need to restart the qrunners, how do I avoid causing multiple copies of posts to be sent out?
On Tue, 2009-08-11 at 13:02 -0500, Lindsay Haisley wrote:
-- Lindsay Haisley | "Everything works if you let it" FMP Computer Services | (The Roadie) 512-259-1190 | http://www.fmp.com |

I restarted (twice) the qrunner suite of processes from the system command line using the system init scripts (/etc/init.d/mailman) with two noticeable results.
First, an egregious number of "Bounce action notifications" and "list unsubscribe notifications" went out on bounces for lists on which I'm listed as an owner, including the one that brought this problem to my attention. Some notifications date back a couple of months so this is apparently a problem of some duration.
Second, many subscribers to the problem list received multiple copies of the most recently queued post. Could this be because I stopped and restarted the qrunners several times? Why would this cause multiple copies to be sent?
I should also note that the bouncing subscribers were _still_ not unsubscribed, nor was the nomail flag set for those for whom a soft bounce was received.
All qrunner processes were (and are still) running, or at least according to the process table. Can these processes crash, or go zombie? If so, what can I do to prevent this? If I need to restart the qrunners, how do I avoid causing multiple copies of posts to be sent out?
On Tue, 2009-08-11 at 13:02 -0500, Lindsay Haisley wrote:
--
Lindsay Haisley | "Never expect the people who caused a problem
FMP Computer Services | to solve it." - Albert Einstein
512-259-1190 |
http://www.fmp.com |

Lindsay Haisley wrote:
I would have to see the /etc/init.d/mailman script to know for sure, but I'm guessing there is something in it that recovers old, stale bounce-events-ppppp.pck files. These files were left behind with the offending bounces when the 2.1.11 bug threw the exception that caused BounceRunner to die without saving the updated list with the bouncing member removed.
Note that this bug, addressed in my earlier reply, only occurs when bounce_you_are_disabled_warnings = 0.
Yes, it could be. You stopped Mailman which signalled OutgoingRunner to stop in the middle of delivering the post. If somehow OutgoingRunner was SIGKILL'd, it would have stopped mid-delivery and when mailman restarted, the backup out queue entry was recovered and the post was delivered to all list members, some of whom had been delivered before. However this is not what normally happens. It is supposed to be SIGTERM'd and finish it's current delivery. Perhaps there's something in the init.d script that will SIGKILL it if it doesn't stop soon enough, or perhaps Mailman was restarted before OutgoingRunner finished and the new OutgoingRunner 'recovered' the old runner's backup queue entry, but this would result in everyone receiving a duplicate unless something outbound of Mailman dropped the duplicate message.
This is the 2.1.11 bug addressed in my earlier reply.
Yes, qrunners can die. Just look at Mailman's qrunner and error logs. Normally, when a qrunner dies, it is automatically restarted by mailmanctl up to 10 restarts.
Duplicates are a pain, and every effort is taken to avoid or minimize them, but if a runner dies, due to an uncaught exception, the message is normally shunted and requires manual action to reprocess, and even this normally doesn't result in duplicates.
Duplicates can occur when a runner is killed asynchronously by a system crash, power failure or perhaps in your case, by your init.d script, but normally, a simple "mailmanctl stop|restart" should just signal the runners, and they shouldn't stop until finished with the current task.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

Mark, thanks for your knowledgeable and _very_ helpful post!
On Wed, 2009-08-12 at 09:41 -0700, Mark Sapiro wrote:
The Gentoo init script for mailman is pretty simple. It executes, as user 'mailman', "mailmanctl -s start", "mailmanctl stop" and "mailmanctl restart" for the standard init script arguments of start, stop and restart. That's all.
Note that this bug, addressed in my earlier reply, only occurs when bounce_you_are_disabled_warnings = 0.
I found a thread on the Gentoo bug reporting list which discusses compatibility issues between Mailman 2.1.11 and Python 2.6, also possibly 2.5 (which I'm running on these boxes). Gentoo is distributing mm 2.1.11 with stable as of yesterday, and 2.1.12 with unstable, but they're apparently pushing to stabilize 2.1.12 ahead of schedule since Python 2.6 is now stable in the distribution. I expect this to happen
I installed Mailman 2.1.12 from Gentoo unstable and at least the problem with non-removal of bouncing addresses seems to have gone away. Perhaps the qrunner processes will also be more stable.
Apparently something strange went down, since all the init.d script does is execute mailmanctl, as noted above.
-- Lindsay Haisley | "The difference between a duck is because FMP Computer Services | one leg is both the same" 512-259-1190 | - Anonymous http://www.fmp.com |

On Wed, 2009-08-12 at 13:02 -0500, Lindsay Haisley wrote:
As of today, MM 2.1.12 is in Gentoo stable.
-- Lindsay Haisley | "In an open world, | PGP public key FMP Computer Services | who needs Windows | available at 512-259-1190 | or Gates" | http://pubkeys.fmp.com http://www.fmp.com | |

On Wed, 2009-08-12 at 13:30 -0500, Lindsay Haisley wrote:
As of today, MM 2.1.12 is in Gentoo stable.
I mis-spoke. Apparently this isn't yet the case, although I would expect it to be so within a week or so.
Sorry ....
-- Lindsay Haisley | "Everything works if you let it" FMP Computer Services | (The Roadie) 512-259-1190 | http://www.fmp.com |

On 8/12/2009 2:02 PM, Lindsay Haisley wrote:
Mine stopped working and no amount of begging on the gentoo forums resulted in any fixes. What finally fixed it for me was to add the full path to the mailmanctl command being issued.
http://forums.gentoo.org/viewtopic-t-641573-postdays-0-postorder-asc-highlig...
Thanks for the heads up about the compatibility issues, guess I'll wait a while before updating, but...
Hmmm... I just synced, and it still shows 2.1.9-r3 as current stable.
and 2.1.12 with unstable,
Confirmed.
--
Best regards,
Charles

Lindsay Haisley <fmouse-mailman@fmp.com> Date: Tue, 11 Aug 2009 13:02:55 -0500 To: mailman-users@python.org Cc: Slim Richey <slim@ridgerunner.com>
This is a bug introduced in 2.1.11 and fixed in 2.1.12.
You should be seeing errors in Mailman's error log too.
The attached Bouncer_patch.txt will fix it.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
participants (4)
-
Lindsay Haisley
-
Lindsay Haisley
-
Mark Sapiro
-
tanstaafl@libertytrek.org