[Mailman-Users] Python process size grows 30x in 8 hours (memory leak?)

July 1, 2008


      An update - I've upgraded to the latest stable python (2.5.2) and its made
no difference to the process growth:
Config:
Solaris 10 x86
Python 2.5.2
Mailman 2.1.9 (8 Incoming queue runners - the leak rate increases with this)
SpamAssassin 3.2.5
At this point I am looking for ways to isolate the suspected memory leak - I
am looking at using dtrace: http://blogs.sun.com/sanjeevb/date/200506
Any other tips appreciated!
Initial (immediately after a /etc/init.d/mailman restart):
last pid: 10330;  load averages:  0.45,  0.19,  0.15
09:13:33
93 processes:  92 sleeping, 1 on cpu
CPU states: 98.6% idle,  0.4% user,  1.0% kernel,  0.0% iowait,  0.0% swap
Memory: 1640M real, 1160M free, 444M swap in use, 2779M swap free
PID USERNAME LWP PRI NICE  SIZE   RES STATE    TIME    CPU COMMAND
10314 mailman    1  59    0 9612K 7132K sleep    0:00  0.35% python
10303 mailman    1  59    0 9604K 7080K sleep    0:00  0.15% python
10305 mailman    1  59    0 9596K 7056K sleep    0:00  0.14% python
10304 mailman    1  59    0 9572K 7036K sleep    0:00  0.14% python
10311 mailman    1  59    0 9572K 7016K sleep    0:00  0.13% python
10310 mailman    1  59    0 9572K 7016K sleep    0:00  0.13% python
10306 mailman    1  59    0 9556K 7020K sleep    0:00  0.14% python
10302 mailman    1  59    0 9548K 6940K sleep    0:00  0.13% python
10319 mailman    1  59    0 9516K 6884K sleep    0:00  0.15% python
10312 mailman    1  59    0 9508K 6860K sleep    0:00  0.12% python
10321 mailman    1  59    0 9500K 6852K sleep    0:00  0.14% python
10309 mailman    1  59    0 9500K 6852K sleep    0:00  0.13% python
10307 mailman    1  59    0 9500K 6852K sleep    0:00  0.13% python
10308 mailman    1  59    0 9500K 6852K sleep    0:00  0.12% python
10313 mailman    1  59    0 9500K 6852K sleep    0:00  0.12% python
After 8 hours:
last pid:  9878;  load averages:  0.14,  0.12,  0.13
09:12:18
97 processes:  96 sleeping, 1 on cpu
CPU states: 97.2% idle,  1.2% user,  1.6% kernel,  0.0% iowait,  0.0% swap
Memory: 1640M real, 179M free, 2121M swap in use, 1100M swap free
PID USERNAME LWP PRI NICE  SIZE   RES STATE    TIME    CPU COMMAND
10123 mailman    1  59    0  314M  311M sleep    1:57  0.02% python
10131 mailman    1  59    0  310M  307M sleep    1:35  0.01% python
10124 mailman    1  59    0  309M   78M sleep    0:45  0.10% python
10134 mailman    1  59    0  307M   81M sleep    1:27  0.01% python
10125 mailman    1  59    0  307M   79M sleep    0:42  0.01% python
10133 mailman    1  59    0   44M   41M sleep    0:14  0.01% python
10122 mailman    1  59    0   34M   30M sleep    0:43  0.39% python
10127 mailman    1  59    0   31M   27M sleep    0:40  0.26% python
10130 mailman    1  59    0   30M   26M sleep    0:15  0.03% python
10129 mailman    1  59    0   28M   24M sleep    0:19  0.10% python
10126 mailman    1  59    0   28M   25M sleep    1:07  0.59% python
10132 mailman    1  59    0   27M   24M sleep    1:00  0.46% python
10128 mailman    1  59    0   27M   24M sleep    0:16  0.01% python
10151 mailman    1  59    0 9516K 3852K sleep    0:05  0.01% python
10150 mailman    1  59    0 9500K 3764K sleep    0:00  0.00% python
On 6/23/08 8:55 PM, "Fletcher Cocquyt" <fcocquyt@stanford.edu> wrote:
...
Mike, many thanks for your (as always) very helpful response - I added the 1
liner to mm_cfg.py to increase the threads to 16.
Now I am observing (via memory trend graphs) an acceleration of what looks
like a memory leak - maybe from python - currently at 2.4
I am compiling the latest 2.5.2 to see if that helps - for now the workaround
is to restart mailman occasionally.
(and yes the spamassassin checks are the source of the 4-10 second delay - now
those happen in parallel x16 - so no spikes in the backlog...)
Thanks again
On 6/20/08 9:01 AM, "Mark Sapiro" <mark@msapiro.net> wrote:
...
Fletcher Cocquyt wrote:
...
Hi, I am observing periods of qfiles/in backlogs in the 400-600 message
count range that take 1-2hours to clear with the standard Mailman 2.1.9 +
Spamassassin (the vette log shows these messages process in an avg of ~10
seconds each)
Is Spamassassin invoked from Mailman or from the MTA before Mailman? If
this plain Mailman, 10 seconds is a hugely long time to process a
single post through IncomingRunner.
If you have some Spamassassin interface like
<http://sourceforge.net/tracker/index.php?func=detail&aid=640518&group_id=103>>
&
...
atid=300103>
that calls spamd from a Mailman handler, you might consider moving
Spamassassin ahead of Mailman and using something like
<http://sourceforge.net/tracker/index.php?func=detail&aid=840426&group_id=103>>
&
...
atid=300103>
or just header_filter_rules instead.
...
Is there an easy way to parallelize what looks like a single serialized
Mailman queue?
I see some posts re: multi-slice  but nothing definitive
See the section of Defaults.py headed with
#####
# Qrunner defaults
#####
In order to run multiple, parallel IncomingRunner processes, you can
either copy the entire QRUNNERS definition from Defaults.py to
mm_cfg.py
and change
('IncomingRunner', 1), # posts from the outside world
to
('IncomingRunner', 4), # posts from the outside world
which says run 4 IncomingRunner processes, or you can just add
something like
QRUNNERS[QRUNNERS.index(('IncomingRunner',1))] = ('IncomingRunner',4)
to mm_cfg.py. You can use any power of two for the number.
...
I would also like the option of working this into an overall loadbalancing
scheme where I have multiple smtp nodes behind an F5 loadbalancer and the
nodes share an NFS backend...
The following search will return some information.
<http://www.google.com/search?q=site%3Amail.python.org++inurl%3Amailman++%22l>>
o
...
ad+balancing%22>
--
Fletcher Cocquyt
Senior Systems Administrator
Information Resources and Technology (IRT)
Stanford University School of Medicine
Email: fcocquyt@stanford.edu
Phone: (650) 724-7485