[Mailman-Users] Python process size grows 30x in 8 hours (memory leak?)

Tue Jul 1 18:19:10 CEST 2008

An update - I've upgraded to the latest stable python (2.5.2) and its made
no difference to the process growth:
Config: 
Solaris 10 x86
Python 2.5.2
Mailman 2.1.9 (8 Incoming queue runners - the leak rate increases with this)
SpamAssassin 3.2.5

At this point I am looking for ways to isolate the suspected memory leak - I
am looking at using dtrace: http://blogs.sun.com/sanjeevb/date/200506

Any other tips appreciated!

Initial (immediately after a /etc/init.d/mailman restart):
last pid: 10330;  load averages:  0.45,  0.19,  0.15
09:13:33
93 processes:  92 sleeping, 1 on cpu
CPU states: 98.6% idle,  0.4% user,  1.0% kernel,  0.0% iowait,  0.0% swap
Memory: 1640M real, 1160M free, 444M swap in use, 2779M swap free

   PID USERNAME LWP PRI NICE  SIZE   RES STATE    TIME    CPU COMMAND
 10314 mailman    1  59    0 9612K 7132K sleep    0:00  0.35% python
 10303 mailman    1  59    0 9604K 7080K sleep    0:00  0.15% python
 10305 mailman    1  59    0 9596K 7056K sleep    0:00  0.14% python
 10304 mailman    1  59    0 9572K 7036K sleep    0:00  0.14% python
 10311 mailman    1  59    0 9572K 7016K sleep    0:00  0.13% python
 10310 mailman    1  59    0 9572K 7016K sleep    0:00  0.13% python
 10306 mailman    1  59    0 9556K 7020K sleep    0:00  0.14% python
 10302 mailman    1  59    0 9548K 6940K sleep    0:00  0.13% python
 10319 mailman    1  59    0 9516K 6884K sleep    0:00  0.15% python
 10312 mailman    1  59    0 9508K 6860K sleep    0:00  0.12% python
 10321 mailman    1  59    0 9500K 6852K sleep    0:00  0.14% python
 10309 mailman    1  59    0 9500K 6852K sleep    0:00  0.13% python
 10307 mailman    1  59    0 9500K 6852K sleep    0:00  0.13% python
 10308 mailman    1  59    0 9500K 6852K sleep    0:00  0.12% python
 10313 mailman    1  59    0 9500K 6852K sleep    0:00  0.12% python

After 8 hours:
last pid:  9878;  load averages:  0.14,  0.12,  0.13
09:12:18
97 processes:  96 sleeping, 1 on cpu
CPU states: 97.2% idle,  1.2% user,  1.6% kernel,  0.0% iowait,  0.0% swap
Memory: 1640M real, 179M free, 2121M swap in use, 1100M swap free

   PID USERNAME LWP PRI NICE  SIZE   RES STATE    TIME    CPU COMMAND
 10123 mailman    1  59    0  314M  311M sleep    1:57  0.02% python
 10131 mailman    1  59    0  310M  307M sleep    1:35  0.01% python
 10124 mailman    1  59    0  309M   78M sleep    0:45  0.10% python
 10134 mailman    1  59    0  307M   81M sleep    1:27  0.01% python
 10125 mailman    1  59    0  307M   79M sleep    0:42  0.01% python
 10133 mailman    1  59    0   44M   41M sleep    0:14  0.01% python
 10122 mailman    1  59    0   34M   30M sleep    0:43  0.39% python
 10127 mailman    1  59    0   31M   27M sleep    0:40  0.26% python
 10130 mailman    1  59    0   30M   26M sleep    0:15  0.03% python
 10129 mailman    1  59    0   28M   24M sleep    0:19  0.10% python
 10126 mailman    1  59    0   28M   25M sleep    1:07  0.59% python
 10132 mailman    1  59    0   27M   24M sleep    1:00  0.46% python
 10128 mailman    1  59    0   27M   24M sleep    0:16  0.01% python
 10151 mailman    1  59    0 9516K 3852K sleep    0:05  0.01% python
 10150 mailman    1  59    0 9500K 3764K sleep    0:00  0.00% python

On 6/23/08 8:55 PM, "Fletcher Cocquyt" <fcocquyt at stanford.edu> wrote:

> Mike, many thanks for your (as always) very helpful response - I added the 1
> liner to mm_cfg.py to increase the threads to 16.
> Now I am observing (via memory trend graphs) an acceleration of what looks
> like a memory leak - maybe from python - currently at 2.4
> 
> I am compiling the latest 2.5.2 to see if that helps - for now the workaround
> is to restart mailman occasionally.
> 
> (and yes the spamassassin checks are the source of the 4-10 second delay - now
> those happen in parallel x16 - so no spikes in the backlog...)
> 
> Thanks again
> 
> 
> On 6/20/08 9:01 AM, "Mark Sapiro" <mark at msapiro.net> wrote:
> 
>> Fletcher Cocquyt wrote:
>> 
>>> Hi, I am observing periods of qfiles/in backlogs in the 400-600 message
>>> count range that take 1-2hours to clear with the standard Mailman 2.1.9 +
>>> Spamassassin (the vette log shows these messages process in an avg of ~10
>>> seconds each)
>> 
>> 
>> Is Spamassassin invoked from Mailman or from the MTA before Mailman? If
>> this plain Mailman, 10 seconds is a hugely long time to process a
>> single post through IncomingRunner.
>> 
>> If you have some Spamassassin interface like
>> 
<http://sourceforge.net/tracker/index.php?func=detail&aid=640518&group_id=103>>
&
>> atid=300103>
>> that calls spamd from a Mailman handler, you might consider moving
>> Spamassassin ahead of Mailman and using something like
>> 
<http://sourceforge.net/tracker/index.php?func=detail&aid=840426&group_id=103>>
&
>> atid=300103>
>> or just header_filter_rules instead.
>> 
>> 
>>> Is there an easy way to parallelize what looks like a single serialized
>>> Mailman queue?
>>> I see some posts re: multi-slice  but nothing definitive
>> 
>> 
>> See the section of Defaults.py headed with
>> 
>> #####
>> # Qrunner defaults
>> #####
>> 
>> In order to run multiple, parallel IncomingRunner processes, you can
>> either copy the entire QRUNNERS definition from Defaults.py to
>> mm_cfg.py
>> and change
>> 
>>     ('IncomingRunner', 1), # posts from the outside world
>> 
>> to
>> 
>>     ('IncomingRunner', 4), # posts from the outside world
>> 
>> 
>> which says run 4 IncomingRunner processes, or you can just add
>> something like
>> 
>> QRUNNERS[QRUNNERS.index(('IncomingRunner',1))] = ('IncomingRunner',4)
>> 
>> to mm_cfg.py. You can use any power of two for the number.
>> 
>> 
>>> I would also like the option of working this into an overall loadbalancing
>>> scheme where I have multiple smtp nodes behind an F5 loadbalancer and the
>>> nodes share an NFS backend...
>> 
>> 
>> The following search will return some information.
>> 
>> 
<http://www.google.com/search?q=site%3Amail.python.org++inurl%3Amailman++%22l>>
o
>> ad+balancing%22>

-- 
Fletcher Cocquyt
Senior Systems Administrator
Information Resources and Technology (IRT)
Stanford University School of Medicine

Email: fcocquyt at stanford.edu
Phone: (650) 724-7485