Mailman chewing up resources
Hello,
We run a VPS running Debian with Mailman 2.1.11 and Postfix. Python is 2.5.2.
Up until a couple of weeks ago, we were running a number of quite small mailing lists (max 200 members) with no problems. But then we imported the subscriber list of a list that we'd moved from Yahoogroups that contained 800+ members, and that's when the trouble started.
This list not only has 800 or so members, but it's a fairly active list.
We also have quite a number of deferred messages from Yahoo in the queue, but that's a different story. We've just implemented DKIM, hopefully that will help sort that one out.
The problem we're seeing is that if the list gets any way busy, the memory usage of the python process that runs Mailman skyrockets, dragging the system to a crawl.
We're a little clueless as to how to debug this further, so any help would be appreciated.
Geoff.
Geoff Shang wrote:
The problem we're seeing is that if the list gets any way busy, the memory usage of the python process that runs Mailman skyrockets, dragging the system to a crawl.
Which Python process? mailmanctl should not be affected. Beyond that, there are 8 qrunner processes. Are they all affected, or just one or two?
We're a little clueless as to how to debug this further, so any help would be appreciated.
There is an issue that affects memory usage in the qrunners. They keep a cache of list objects in memory to reduce disk IO. The cache is supposed to free the space used by a list object when there are no more references to that object, but it turns out there is a self-reference in the list objects, so the cache simply grows until it holds a copy of each list.
There are other issues in that large messages can cause the runners that handle it to grow, and Python's memory management is such that Python itself never gives freed memory back to the OS. Memory can be freed within Python and it will be available for reuse within that process, but it is not given back to the OS.
I recommend disabling the list cache within the qrunners. This was done for the now defunct 2.2 branch, but has not been done on the 2.1 branch.
The attached Runner.patch.txt file contains a patch to do this. I suggest you try the patch and see if that helps.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
On Fri, 12 Mar 2010, Mark Sapiro wrote:
There is an issue that affects memory usage in the qrunners. They keep a cache of list objects in memory to reduce disk IO. The cache is supposed to free the space used by a list object when there are no more references to that object, but it turns out there is a self-reference in the list objects, so the cache simply grows until it holds a copy of each list.
There are other issues in that large messages can cause the runners that handle it to grow, and Python's memory management is such that Python itself never gives freed memory back to the OS. Memory can be freed within Python and it will be available for reuse within that process, but it is not given back to the OS.
I recommend disabling the list cache within the qrunners. This was done for the now defunct 2.2 branch, but has not been done on the 2.1 branch.
The attached Runner.patch.txt file contains a patch to do this. I suggest you try the patch and see if that helps.
Thanks for this.
I patched Mailman/Queue/Runner.py and the patch applied cleanly. I even checked that the patch had applied.
I stopped and restarted Mailman.
But Runner.pyc wasn't updated.
I deleted Runner.pyc in order to make sure that the old one wasn't being used, but 17 hours on, Runner.pyc hasn't been recreated.
The system seems to be running fine from what I've seen, but I'm curious to know if the file I'm patched is in fact being used at all.
Geoff.
Geoff Shang wrote:
I patched Mailman/Queue/Runner.py and the patch applied cleanly. I even checked that the patch had applied.
I stopped and restarted Mailman.
But Runner.pyc wasn't updated.
I deleted Runner.pyc in order to make sure that the old one wasn't being used, but 17 hours on, Runner.pyc hasn't been recreated.
The system seems to be running fine from what I've seen, but I'm curious to know if the file I'm patched is in fact being used at all.
Assuming the file you patched is in the Mailman/Queue/ directory that is actually being used by Mailman, You are getting the patched module.
This is a permissions issue. The qrunners do not have permission to write Mailman/Queue/Runner.pyc. When Runner is imported, Python detects that Runner.py is newer than Runner.pyc (or there is no Runner.pyc) and loads and compiles Runner.py. It then attempts to write Runner.pyc, but if it fails, it just goes on.
The Mailman/Queue/ directory should be Mailman's group, group writable and SETGID, and all the files should be Mailman's group and group writable.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
participants (2)
-
Geoff Shang
-
Mark Sapiro