Bugs item #1077587, was opened at 2004-12-02 09:02 Message generated for change (Comment added) made by bwarsaw You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=100103&aid=1077587&group_id=103 Category: bounce detection Group: 2.1 (stable) Status: Open Resolution: None Priority: 5 Submitted By: Paul Rubin (prubin) Assigned to: Nobody/Anonymous (nobody) Summary: Memory Leak in Bounce Runner Initial Comment: Something is going bady wrong with the BouceRunner It is leaking memory. After it runs for a short time it has consumed hundreds of megabytes of memory. I kill it with -9 and it restarts and is fine for another couple of hours. Sometimes it does not restart and I have to stop and restart the mailman service. This is running on a Linux box with Redhat 9 and postfix 2.1.1 and mailman 2.1.5 and python 2.2.2 Below you will se a PS from right before I killed the process and below that one from a few seconds later. If you will tell me what you need I would really like to get to the bottom of this, I am killing the process like 10 times per day. If it eats too much memory before I catch it then the entire system fails. [root@tbnonline ~]# ps -U mailman -o pid,%cpu,% mem,stime,time,vsz,args PID %CPU %MEM STIME TIME VSZ COMMAND 12949 0.0 0.0 08:05 00:00:00 7068 /usr/bin/python /var/mailman/bin/mailmanctl -s -q start 12950 0.1 0.3 08:05 00:00:02 11176 /usr/bin/python /var/mailman/bin/qrunner -- runner=ArchRunner:0:1 -s 12951 18.9 70.8 08:05 00:08:40 931312 /usr/bin/python /var/mailman/bin/qrunner -- runner=BounceRunner:0:1 -s 12952 0.0 0.0 08:05 00:00:00 7040 /usr/bin/python /var/mailman/bin/qrunner -- runner=CommandRunner:0:1 -s 12953 0.0 0.1 08:05 00:00:00 9256 /usr/bin/python /var/mailman/bin/qrunner -- runner=IncomingRunner:0:1 -s 12954 0.0 0.1 08:05 00:00:00 7080 /usr/bin/python /var/mailman/bin/qrunner -- runner=NewsRunner:0:1 -s 12955 2.5 0.6 08:05 00:01:11 14172 /usr/bin/python /var/mailman/bin/qrunner -- runner=OutgoingRunner:0:1 -s 12956 0.8 0.2 08:05 00:00:24 10628 /usr/bin/python /var/mailman/bin/qrunner -- runner=VirginRunner:0:1 -s 12957 0.1 0.2 08:05 00:00:04 13272 /usr/bin/python /var/mailman/bin/qrunner -- runner=RetryRunner:0:1 -s [root@tbnonline ~]# ps -U mailman -o pid,%cpu,% mem,stime,time,vsz,args PID %CPU %MEM STIME TIME VSZ COMMAND 12949 0.0 0.1 08:05 00:00:00 7072 /usr/bin/python /var/mailman/bin/mailmanctl -s -q start 12950 0.0 0.3 08:05 00:00:02 11176 /usr/bin/python /var/mailman/bin/qrunner -- runner=ArchRunner:0:1 -s 12952 0.0 0.0 08:05 00:00:00 7040 /usr/bin/python /var/mailman/bin/qrunner -- runner=CommandRunner:0:1 -s 12953 0.0 0.2 08:05 00:00:00 9256 /usr/bin/python /var/mailman/bin/qrunner -- runner=IncomingRunner:0:1 -s 12954 0.0 0.1 08:05 00:00:00 7080 /usr/bin/python /var/mailman/bin/qrunner -- runner=NewsRunner:0:1 -s 12955 3.0 0.9 08:05 00:01:43 13584 /usr/bin/python /var/mailman/bin/qrunner -- runner=OutgoingRunner:0:1 -s 12956 1.2 0.6 08:05 00:00:41 10848 /usr/bin/python /var/mailman/bin/qrunner -- runner=VirginRunner:0:1 -s 12957 0.1 0.6 08:05 00:00:06 13284 /usr/bin/python /var/mailman/bin/qrunner -- runner=RetryRunner:0:1 -s 14900 29.8 1.1 08:51 00:02:47 13764 /usr/bin/python /var/mailman/bin/qrunner -- runner=BounceRunner:0:1 -s ----------------------------------------------------------------------
Comment By: Barry A. Warsaw (bwarsaw) Date: 2005-02-24 10:18
Message: Logged In: YES user_id=12800 The problem really is that information is logged to a file and periodically that file is read and processed, however no limit is placed on the amount of the log file that's read during any one processing loop. The bounce runner wants to read as much as possible because it will sort bounces so it can be more efficient about locking lists for bounce info updates. But if too much of the file is read then of course you get the huge memory footprint. Probably some sort of limit on the number of records read from the file is the way to go. ---------------------------------------------------------------------- Comment By: Paul Rubin (prubin) Date: 2005-02-24 10:18 Message: Logged In: YES user_id=91557 ok, setting REGISTER_BOUNCES_EVERY = minutes(1) has the file hovering at between 120 and 10MB, but the process is using 300MB of RAM. Does this make any sense? What is the bounce processor doing that is consuming soo much memory relative to the file size? ---------------------------------------------------------------------- Comment By: Paul Rubin (prubin) Date: 2005-02-24 10:08 Message: Logged In: YES user_id=91557 I will try the suggested settings... The bounce process currently builds up about 150MB per minute when started and takes about 2 minutes after being stoped for each minute run before it exits. This occures even when postfix is complete stopped. If the file gets over 300Meg than the exit time goes to 3 times, at 400Meg it will crash first with out of memory. Is is possible that there is some hangup that is causing bounce notices not to get pulled from postfix and the same ones just keep getting pulled over and over? ---------------------------------------------------------------------- Comment By: Barry A. Warsaw (bwarsaw) Date: 2005-02-24 07:42 Message: Logged In: YES user_id=12800 I have some ideas about fixing bounce runner. If I have time for 2.1.6 I'll try to attack it. ---------------------------------------------------------------------- Comment By: Tokio Kikuchi (tkikuchi) Date: 2005-02-23 20:46 Message: Logged In: YES user_id=67709 Your site has really big amount of bounces get. I'd suggest try one or two of these: 1. Set bounce_processing to "No" in admin->bounce page. 2. Set REGISTER_BOUNCES_EVERY = minutes(1) in mm_cfg.py and process bounces before they accumulate to Giga-Byte. 3. Rewrite your MTA's alias file as your-list-bounce: yourmail@your.dom.ain and process the counce manually. 4. Rewrite your MTA's alias file as your-list-bounce: /dev/null and forget about the bounces totally. ---------------------------------------------------------------------- Comment By: Paul Rubin (prubin) Date: 2005-02-23 20:22 Message: Logged In: YES user_id=91557 One additional piece, the bounce processor sits at a small amount of memory until the file with matching pid hits 1.5GB or so and then starts climbing fast. If I kill the bounce processor, the file is abandoned., If I stop the mailman service, the bounce processor keeps running and eating memory. If allowed to run unchecked, it will just eat all the memory in the system. I cannot kill the bounce process without another starting, even after stopping the mailman service. If I freshly re-boot the server and let mailman run for a few minutes, the the file grows, when I stop the service the file shrinks back to 0 bytes, but does not get deleted. I hope this helps. ---------------------------------------------------------------------- Comment By: Paul Rubin (prubin) Date: 2005-02-23 19:35 Message: Logged In: YES user_id=91557 Today we ran out of disk space, I have had to kill the bounce processor about 8 or nine times today... I found my diskspace problem: -rw-rw-rw- 1 mailman mailman 1.4G Feb 23 08:48 bounce-events-07208.pck -rw-rw-rw- 1 mailman mailman 931M Feb 23 09:50 bounce-events-08307.pck -rw-rw-rw- 1 mailman mailman 1.1G Feb 23 10:10 bounce-events-09037.pck -rw-rw-rw- 1 mailman mailman 1.4G Feb 23 10:29 bounce-events-10251.pck -rw-rw-rw- 1 mailman mailman 1.6G Feb 23 13:02 bounce-events-14874.pck -rw-rw-rw- 1 mailman mailman 1.4G Feb 23 14:17 bounce-events-17525.pck -rw-rw-rw- 1 mailman mailman 1.6G Feb 23 14:40 bounce-events-18973.pck -rw-rw-rw- 1 mailman mailman 1.6G Feb 23 15:02 bounce-events-19879.pck -rw-rw-rw- 1 mailman mailman 1.5G Feb 23 15:23 bounce-events-20584.pck And about 100 more file from other days. I have saved these file, and deleted the rest, does this or can these files tell you anything about what is going wrong? ---------------------------------------------------------------------- Comment By: Tokio Kikuchi (tkikuchi) Date: 2004-12-16 20:29 Message: Logged In: YES user_id=67709 I suggest stop automatic processing of bounces by setting bounce_processing variable to 'No' at the admin/bounce page. Looks like your server is very busy and unsubscribing process due to the bounce score may interfering. You may also have to unsubscribe the problematic members manually. ---------------------------------------------------------------------- Comment By: Paul Rubin (prubin) Date: 2004-12-15 12:29 Message: Logged In: YES user_id=91557 I do not have a specific number of bounces per day. We send around 500,000 message per day with peak days around 1,000,000, we know that we have some bad addresses, but at any given time there should not be more than 5,000 bounce notices per day (full mailboxes are common) As far as I can tell only certain bounce notices cause the leak. We can go hours or days with memory being almost flat, then suddenly 200M vanishes in 2 or 3 minutes. Is there any way I could 'hack' the code to somehow grab the information about what bounce notice is causing the problem. Or to capture all bounce notices in some outof the way space that I could tar up an send to you for testing. If nothing else I could edit the aliases to copy all of the messages to a file and zip it down for you after a few days. Does any of this make any sense? ---------------------------------------------------------------------- Comment By: Tokio Kikuchi (tkikuchi) Date: 2004-12-14 19:19 Message: Logged In: YES user_id=67709 My FreeBSD4.7/Solaris8 installations have no such problems. How many bounces you get on this system? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=100103&aid=1077587&group_id=103