
On 7/1/08 3:37 PM, "Mark Sapiro" <mark@msapiro.net> wrote:
Fletcher Cocquyt wrote:
Not finding a "leak" ref - save a irrelevant (for this runner issue) admindb
Nothing has been done in Mailman to fix any memory leaks. As far as I know, nothing has been done to create any either.
Ok, thanks for confirming that - I will not prioritize a mailman 2.1.9->2.1.11 upgrade
If there is a leak, it is most likely in the underlying Python and not a Mailman issue per se.
Agreed - hence my first priority to upgrade from python 2.4.x to 2.5.2 (the latest on python.org) - but upgrading did not help this
I am curious. You say this problem was exacerbated when you went from one IncomingRunner to eight (sliced) IncomingRunners. The IncomingRunner instances themselves should be processing fewer messages each, and I would expect them to leak less. The other runners are doing the same as before so I would expect them to be the same unless by solving your 'in' queue backlog, you're just handling a whole lot more messages.
Also, in an 8 hour period, I would expect that RetryRunner and CommandRunner and, unless you are doing a lot of mail -> news gatewaying, NewsRunner to have done virtually nothing.
In this snapshot
PID USERNAME LWP PRI NICE SIZE RES STATE TIME CPU COMMAND 10123 mailman 1 59 0 314M 311M sleep 1:57 0.02% python 10131 mailman 1 59 0 310M 307M sleep 1:35 0.01% python 10124 mailman 1 59 0 309M 78M sleep 0:45 0.10% python 10134 mailman 1 59 0 307M 81M sleep 1:27 0.01% python 10125 mailman 1 59 0 307M 79M sleep 0:42 0.01% python 10133 mailman 1 59 0 44M 41M sleep 0:14 0.01% python 10122 mailman 1 59 0 34M 30M sleep 0:43 0.39% python 10127 mailman 1 59 0 31M 27M sleep 0:40 0.26% python 10130 mailman 1 59 0 30M 26M sleep 0:15 0.03% python 10129 mailman 1 59 0 28M 24M sleep 0:19 0.10% python 10126 mailman 1 59 0 28M 25M sleep 1:07 0.59% python 10132 mailman 1 59 0 27M 24M sleep 1:00 0.46% python 10128 mailman 1 59 0 27M 24M sleep 0:16 0.01% python 10151 mailman 1 59 0 9516K 3852K sleep 0:05 0.01% python 10150 mailman 1 59 0 9500K 3764K sleep 0:00 0.00% python
Which processes correspond to which runners. And why are the two processes that have apparently done the least the ones that have grown the most.
In fact, why are none of these 15 PIDs the same as the ones from 8 hours earlier, or was that snapshot actually from after the above were restarted? Yes, I snapshot'ed the current leaked state, then restarted and snapped those new PIDs to show the size diff.
Here is the current leaked state since the the cron 13:27 restart only 3 hours ago: last pid: 20867; load averages: 0.53, 0.47, 0.24 16:04:15 91 processes: 90 sleeping, 1 on cpu CPU states: 99.1% idle, 0.3% user, 0.6% kernel, 0.0% iowait, 0.0% swap Memory: 1640M real, 77M free, 1509M swap in use, 1699M swap free
PID USERNAME LWP PRI NICE SIZE RES STATE TIME CPU COMMAND 24167 mailman 1 59 0 311M 309M sleep 0:28 0.02% python 24158 mailman 1 59 0 308M 305M sleep 0:30 0.01% python 24169 mailman 1 59 0 303M 301M sleep 0:28 0.01% python 24165 mailman 1 59 0 29M 27M sleep 0:09 0.03% python 24161 mailman 1 59 0 29M 27M sleep 0:12 0.07% python 24164 mailman 1 59 0 28M 26M sleep 0:07 0.01% python 24172 mailman 1 59 0 26M 24M sleep 0:04 0.01% python 24160 mailman 1 59 0 26M 24M sleep 0:08 0.01% python 24162 mailman 1 59 0 26M 23M sleep 0:10 0.01% python 24166 mailman 1 59 0 26M 23M sleep 0:04 0.01% python 24171 mailman 1 59 0 25M 23M sleep 0:04 0.02% python 24163 mailman 1 59 0 24M 22M sleep 0:04 0.01% python 24168 mailman 1 59 0 19M 17M sleep 0:03 0.02% python 24170 mailman 1 59 0 9516K 6884K sleep 0:01 0.01% python 24159 mailman 1 59 0 9500K 6852K sleep 0:00 0.00% python
And the mapping to the runners: god@irt-smtp-02:mailman-2.1.11 4:16pm 66 # /usr/ucb/ps auxw | egrep mailman | awk '{print $2 " " $11}' 24167 --runner=IncomingRunner:5:8 24165 --runner=BounceRunner:0:1 24158 --runner=IncomingRunner:7:8 24162 --runner=VirginRunner:0:1 24163 --runner=IncomingRunner:1:8 24166 --runner=IncomingRunner:0:8 24168 --runner=IncomingRunner:4:8 24169 --runner=IncomingRunner:2:8 24171 --runner=IncomingRunner:6:8 24172 --runner=IncomingRunner:3:8 24160 --runner=CommandRunner:0:1 24161 --runner=OutgoingRunner:0:1 24164 --runner=ArchRunner:0:1 24170 /bin/python 24159 /bin/python
Thanks for the analysis, Fletcher