[Mailman-Users] Bounce Processing Problems

Paul Rubin paul at rubinsoftware.com
Thu Sep 1 17:35:51 CEST 2005


Mailman 2.1.6
Python 2.4.1
27 Lists
Average 8000 people per list
Average message size 100k
Largest list gets around 200 messages per day
Total load average 2million messages per day give or take.
We have been using mailman successfully for about 4 years.
 
Basic Problem: 
  Since about 12/2004 The bounce processor consumes most of the memory in
the system and sometimes all of the CPU.
 
What we have tried:

*	Change REGISTER_BOUNCES_EVERY = minutes(1)  which change us from
complete crashing to just consuming all memory and cpu.
*	Temporarily shut off bounce processing on all lists, no effect. 
*	Temporarily reroute bounce messages to a mailbox for human
processing, We received about 5,000 per hour most of which were temporary
(mailbox full, mail server not answering, etc)
*	When it gets really bad Kill and restart mailman every 2 minutes.
Mostly works but kills all request in progress when the kill comes through,
however those request we already dead because of the locks on the lists
*	Contacting the bug forum on Source forge and being told that bounce
processing is a problem that they hope to have time to fix in 2.1.6  (see
1077587)

What top often looks like:
 
 09:39:33  up 36 days, 15:58,  9 users,  load average: 5.23, 5.75, 4.98
497 processes: 485 sleeping, 8 running, 4 zombie, 0 stopped
CPU states:  67.8% user  16.6% system   0.0% nice   0.0% iowait  15.4% idle
Mem:  1029884k av,  973668k used,   56216k free,       0k shrd,  149788k
buff
                    739156k actv,   58428k in_d,   17080k in_c
Swap: 2040244k av,  188092k used, 1852152k free                  184036k
cached
 
  PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME CPU COMMAND
14978 mailman   16   0  324M 318M  2328 R     6.5 31.6   5:46   0
/usr/local/bin/python /var/mailman/bin/qrunner --runner=BounceRunner:0:1 
14977 mailman   15   0 28916  27M  2420 S     0.0  2.7   0:31   0
/usr/local/bin/python /var/mailman/bin/qrunner --runner=ArchRunner:0:1 -s
14983 mailman   15   0 27608  25M  2320 S     0.2  2.5   0:23   0
/usr/local/bin/python /var/mailman/bin/qrunner --runner=VirginRunner:0:1 
14982 mailman   15   0 23832  22M  2460 S     0.7  2.2   0:58   0
/usr/local/bin/python /var/mailman/bin/qrunner --runner=OutgoingRunner:0:
14980 mailman   15   0 22904  21M  2364 S     0.0  2.1   0:27   0
/usr/local/bin/python /var/mailman/bin/qrunner --runner=IncomingRunner:0:
14981 mailman   15   0  5508 2568  2180 S     0.0  0.2   0:00   0
/usr/local/bin/python /var/mailman/bin/qrunner --runner=NewsRunner:0:1 -s
14984 mailman   15   0  5460 2548  2180 S     0.0  0.2   0:00   0
/usr/local/bin/python /var/mailman/bin/qrunner --runner=RetryRunner:0:1 -
14979 mailman   15   0  5476 2540  2180 S     0.0  0.2   0:00   0
/usr/local/bin/python /var/mailman/bin/qrunner --runner=CommandRunner:0:1
14976 mailman   25   0  5524 2220  2216 S     0.0  0.2   0:00   0
/usr/bin/python /var/mailman/bin/mailmanctl -s -q start

And immediately after a shutdown restart cycle:
 
 09:42:12  up 36 days, 16:01,  9 users,  load average: 4.21, 5.23, 4.91
532 processes: 521 sleeping, 7 running, 4 zombie, 0 stopped
CPU states:  82.3% user   6.3% system   0.0% nice   0.0% iowait  11.3% idle
Mem:  1029884k av,  643420k used,  386464k free,       0k shrd,  149556k
buff
                    435668k actv,   33664k in_d,   16752k in_c
Swap: 2040244k av,  111832k used, 1928412k free                  161048k
cached
 
  PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME CPU COMMAND
32382 mailman   16   0 24312  23M  2660 R    27.2  2.3   0:03   0
/usr/local/bin/python /var/mailman/bin/qrunner --runner=ArchRunner:0:1 -s
32385 mailman   15   0 22036  21M  2616 R    23.0  2.1   0:02   0
/usr/local/bin/python /var/mailman/bin/qrunner --runner=IncomingRunner:0:
32388 mailman   15   0 21816  21M  2584 S    14.1  2.1   0:01   0
/usr/local/bin/python /var/mailman/bin/qrunner --runner=VirginRunner:0:1 
32383 mailman   16   0 20604  20M  2568 R     6.1  2.0   0:03   0
/usr/local/bin/python /var/mailman/bin/qrunner --runner=BounceRunner:0:1 
32387 mailman   15   0 15960  15M  2712 S     0.5  1.5   0:02   0
/usr/local/bin/python /var/mailman/bin/qrunner --runner=OutgoingRunner:0:
32379 mailman   25   0  5900 5900  2596 S     0.0  0.5   0:00   0
/usr/bin/python /var/mailman/bin/mailmanctl -s -q start
32386 mailman   15   0  5840 5840  2512 S     0.0  0.5   0:00   0
/usr/local/bin/python /var/mailman/bin/qrunner --runner=NewsRunner:0:1 -s
32384 mailman   15   0  5804 5804  2512 S     0.0  0.5   0:00   0
/usr/local/bin/python /var/mailman/bin/qrunner --runner=CommandRunner:0:1
32389 mailman   25   0  5792 5792  2512 S     0.0  0.5   0:00   0
/usr/local/bin/python /var/mailman/bin/qrunner --runner=RetryRunner:0:1 -

Notice how the other tasks suddenly get busy....
 
Log files were completely flushed at 4PM yesterday.
 
[mailman at tbnonline ~]$ ls -al logs
total 16144
drwxrwsr-x    2 mailman  mailman      4096 Aug 31 19:19 .
drwxrwsr-x   34 mailman  mailman      4096 Sep  1 08:40 ..
-rw-rw-r--    1 mailman  mailman  15793138 Sep  1 09:02 bounce
-rw-rw-r--    1 smmsp    mailman    335919 Sep  1 09:01 error
-rw-rw-r--    1 mailman  mailman      5181 Sep  1 06:25 locks
-rw-rw-r--    1 mailman  mailman     28700 Sep  1 09:01 post
-rw-rw-r--    1 mailman  mailman     35858 Sep  1 08:32 qrunner
-rw-rw-r--    1 mailman  mailman    263590 Sep  1 09:02 smtp
-rw-rw-r--    1 mailman  mailman      1519 Sep  1 06:31 smtp-failure
-rw-rw-r--    1 mailman  mailman      6833 Sep  1 08:59 subscribe
-rw-rw-r--    1 mailman  mailman      1069 Sep  1 08:32 vette

Sample from bounce log:
 
Sep 01 08:58:30 2005 (14978) bounce message w/no discernable addresses: <
<mailto:20050901125626.19066.qmail at smtp02-01.prod.mesa1.secureserver.net>
20050901125626.19066.qmail at smtp02-01.prod.mesa1.secureserver.net>
Sep 01 08:58:30 2005 (14978) discarding unrecognized, message-id: <
<mailto:20050901125626.19066.qmail at smtp02-01.prod.mesa1.secureserver.net>
20050901125626.19066.qmail at smtp02-01.prod.mesa1.secureserver.net>
Sep 01 08:58:32 2005 (14978) bounce message w/no discernable addresses: <
<mailto:Terrace26313811125579171 at tbnonline.com>
Terrace26313811125579171 at tbnonline.com>
Sep 01 08:58:32 2005 (14978) discarding unrecognized, message-id: <
<mailto:Terrace26313811125579171 at tbnonline.com>
Terrace26313811125579171 at tbnonline.com>
Sep 01 08:58:34 2005 (14978) bounce message w/no discernable addresses: <
<mailto:0MKvAI-1EAocE2a2R-0002jN at mx.perfora.net>
0MKvAI-1EAocE2a2R-0002jN at mx.perfora.net>
Sep 01 08:58:34 2005 (14978) discarding unrecognized, message-id: <
<mailto:0MKvAI-1EAocE2a2R-0002jN at mx.perfora.net>
0MKvAI-1EAocE2a2R-0002jN at mx.perfora.net>
Sep 01 08:58:35 2005 (14978) bounce message w/no discernable addresses: <
<mailto:20050901125631.ABF4E38E49 at tbnonline.org>
20050901125631.ABF4E38E49 at tbnonline.org>
Sep 01 08:58:35 2005 (14978) discarding unrecognized, message-id: <
<mailto:20050901125631.ABF4E38E49 at tbnonline.org>
20050901125631.ABF4E38E49 at tbnonline.org>
Sep 01 08:58:39 2005 (14978) bounce message w/no discernable addresses: <
<mailto:20050901125636.19173.qmail at smtp03-01.prod.mesa1.secureserver.net>
20050901125636.19173.qmail at smtp03-01.prod.mesa1.secureserver.net>
Sep 01 08:58:39 2005 (14978) discarding unrecognized, message-id: <
<mailto:20050901125636.19173.qmail at smtp03-01.prod.mesa1.secureserver.net>
20050901125636.19173.qmail at smtp03-01.prod.mesa1.secureserver.net>

Sample from error log:
 
Sep 01 08:57:32 2005 (14978) Ignoring unparseable message:
1125579452.21312+848e219740e9c93ca6f761384c120d97593ee364
Sep 01 08:57:58 2005 (14978) Uncaught runner exception: No terminating
boundary and no trailing empty line
Sep 01 08:57:58 2005 (14978) Traceback (most recent call last):
  File "/var/mailman/Mailman/Queue/Runner.py", line 99, in _oneloop
    msg, msgdata = self._switchboard.dequeue(filebase)
  File "/var/mailman/Mailman/Queue/Switchboard.py", line 151, in dequeue
    msg = email.message_from_string(msg, Message.Message)
  File "/var/mailman/pythonlib/email/__init__.py", line 52, in
message_from_string
    return Parser(_class, strict=strict).parsestr(s)
  File "/var/mailman/pythonlib/email/Parser.py", line 75, in parsestr
    return self.parse(StringIO(text), headersonly=headersonly)
  File "/var/mailman/pythonlib/email/Parser.py", line 64, in parse
    self._parsebody(root, fp, firstbodyline)
  File "/var/mailman/pythonlib/email/Parser.py", line 240, in _parsebody
    msgobj = self.parsestr(part)
  File "/var/mailman/pythonlib/email/Parser.py", line 75, in parsestr
    return self.parse(StringIO(text), headersonly=headersonly)
  File "/var/mailman/pythonlib/email/Parser.py", line 64, in parse
    self._parsebody(root, fp, firstbodyline)
  File "/var/mailman/pythonlib/email/Parser.py", line 265, in _parsebody
    msg = self.parse(fp)
  File "/var/mailman/pythonlib/email/Parser.py", line 64, in parse
    self._parsebody(root, fp, firstbodyline)
  File "/var/mailman/pythonlib/email/Parser.py", line 206, in _parsebody
    raise Errors.BoundaryError(
BoundaryError: No terminating boundary and no trailing empty line
 
Sep 01 08:57:58 2005 (14978) Ignoring unparseable message:
1125579478.065345+8ca74677621e4e9262be5503f2e081184abc853e

Sample from smtp log:
Sep 01 09:02:32 2005 (14982) <
<mailto:mailman.68.1125579745.14978.brdcst-compaq at tbnonline.com>
mailman.68.1125579745.14978.brdcst-compaq at tbnonline.com> smtp to
brdcst-compaq for 1 recips, completed in 0.253 seconds
Sep 01 09:02:32 2005 (14982) <
<mailto:mailman.69.1125579745.14978.brdcst-compaq at tbnonline.com>
mailman.69.1125579745.14978.brdcst-compaq at tbnonline.com> smtp to
brdcst-compaq for 1 recips, completed in 0.432 seconds
Sep 01 09:02:32 2005 (14982) <
<mailto:mailman.70.1125579745.14978.brdcst-compaq at tbnonline.com>
mailman.70.1125579745.14978.brdcst-compaq at tbnonline.com> smtp to
brdcst-compaq for 1 recips, completed in 0.140 seconds
Sep 01 09:02:33 2005 (14982) <
<mailto:mailman.71.1125579745.14978.brdcst-compaq at tbnonline.com>
mailman.71.1125579745.14978.brdcst-compaq at tbnonline.com> smtp to
brdcst-compaq for 1 recips, completed in 0.315 seconds
Sep 01 09:02:42 2005 (14982) <
<mailto:mailman.61.1125579744.14978.brdcst-printer at tbnonline.com>
mailman.61.1125579744.14978.brdcst-printer at tbnonline.com> smtp to
brdcst-printer for 2 recips, completed in 4.212 seconds
Sep 01 09:02:44 2005 (14982) <
<mailto:mailman.65.1125579744.14978.brdcst-dell at tbnonline.com>
mailman.65.1125579744.14978.brdcst-dell at tbnonline.com> smtp to brdcst-dell
for 2 recips, completed in 1.629 seconds
Sep 01 09:02:44 2005 (14982) <
<mailto:mailman.71.1125579745.14978.brdcst-compaq at tbnonline.com>
mailman.71.1125579745.14978.brdcst-compaq at tbnonline.com> smtp to
brdcst-compaq for 2 recips, completed in 0.842 seconds
Sep 01 09:02:45 2005 (14982) <
<mailto:mailman.63.1125579744.14978.brdcst-printer at tbnonline.com>
mailman.63.1125579744.14978.brdcst-printer at tbnonline.com> smtp to
brdcst-printer for 2 recips, completed in 0.427 seconds
Sep 01 09:02:45 2005 (14982) <
<mailto:mailman.69.1125579745.14978.brdcst-compaq at tbnonline.com>
mailman.69.1125579745.14978.brdcst-compaq at tbnonline.com> smtp to
brdcst-compaq for 2 recips, completed in 0.391 seconds
Sep 01 09:02:46 2005 (14982) <
<mailto:mailman.67.1125579744.14978.brdcst-compaq at tbnonline.com>
mailman.67.1125579744.14978.brdcst-compaq at tbnonline.com> smtp to
brdcst-compaq for 2 recips, completed in 0.374 seconds

 
 



More information about the Mailman-Users mailing list