suggestion for smarter garbage collection in function of size (gc.set_collect_mem_growth(2))
Hello, I run into a problem recently with a reconnectingclientfactory with twisted while write some spare time software, that turned out to be a gc inefficiency. In short the protocol memory wasn't released after the reconnect and the protocol had about 50M attached to it. So with frequent reconnects the rss of the task grown to >1G and it pushed the system into heavy swap. In reality only 50M were allocated, so 950M were wasted by python. A gc.collect() explicit invocation at every retry of the factory fixed it. However this means there is significant room for improvement in the default behaviour of the python gc. In short the real bug is that the python gc isn't working in function of size in any way. It doesn't matter the number of objects, it matters their size! The whole story can be found in this thread: http://twistedmatrix.com/pipermail/twisted-python/2005-December/012233.html My suggestion to fix this problem in autopilot mode (without requiring explicit gc.collect()) is to invoke a gc.collect() (or anyway to go deep down freeing everything possible) at least once every time the amount of anonymous memory allocated by the interpreter doubles. The tunable should be a float >= 1. When the tunable is 1 the feature is disabled (so it works like current python today). Default should be 2 (which means to invoke gc.collect() after a 100% increase of the anonymous memory allocated by the interpreter). We could also have yet another threshold that sets a minimum of ram after which this heuristic in function of size kicks in, but it's not very important and it may not be worth it (whem little memory is involved gc.collect() should be fast anyway). To implement this we need two hooks, in the malloc and free that allocate python objects. Then we have to store the minimum of this value (i.e. the last minimum of memory allocated by the interpreter). The algorithm I'd suggest is like this (supposedly readable pseudocode): gc.set_collect_mem_growth(v): assert float(v) >= 1 gc.collect_mem_growth = v python_malloc(size): ram_size += size if ram_size > min_ram_size * gc.collect_mem_growth: gc.collect() # python_free runs inside it min_ram_size = ram_size # ram size after gc.collect() python_free(size): ram_size -= size min_ram_size = min(min_ram_size, size) The overhead of this should be zero, and it'll fix my testcase just fine. I believe the default should be 2 (equivalent to 100% growth of rss to trigger a full collect) even though it alters the behaviour of the gc, I think it's a bug that so much memory can be leaked when it could be reclaimed istantly. I wouldn't change other parameters, this heuristic in function of size would be completely orthogonal and disconnected by the current heuristics in function of the number of elements. It has taken me a day and precious help from the twisted folks to realize it wasn't a memleak in my twisted spare time application (but well, it was good since I learnt about the fact I created an heisenbug by using __del__ to debug the apparent memleak ;). Thanks.
On Tue, Dec 27, 2005, Andrea Arcangeli wrote:
My suggestion to fix this problem in autopilot mode (without requiring explicit gc.collect()) is to invoke a gc.collect() (or anyway to go deep down freeing everything possible) at least once every time the amount of anonymous memory allocated by the interpreter doubles. The tunable should be a float >= 1. When the tunable is 1 the feature is disabled (so it works like current python today). Default should be 2 (which means to invoke gc.collect() after a 100% increase of the anonymous memory allocated by the interpreter). We could also have yet another threshold that sets a minimum of ram after which this heuristic in function of size kicks in, but it's not very important and it may not be worth it (whem little memory is involved gc.collect() should be fast anyway).
If you feel comfortable with C code, the best way to get this to happen would be to make the change yourself, then test to find out what effects this has on Python (in terms of speed and memory usage and whether it breaks any of the regression tests). Once you've satisfied yourself that it works, submit a patch, and post here again with the SF number. Note that since your tunable parameter is presumably accessible from Python code, you'll also need to submit doc patches and tests to verify that it's working correctly. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ "Don't listen to schmucks on USENET when making legal decisions. Hire yourself a competent schmuck." --USENET schmuck (aka Robert Kern)
On Wed, Dec 28, 2005 at 05:52:06AM -0800, Aahz wrote:
If you feel comfortable with C code, the best way to get this to happen would be to make the change yourself, then test to find out what effects
I'm more confortable with C code than with python code, that's not the problem (infact I think that everyone should be confortable with C ;). The only problem is that my time on this is quite limited, but I will be really be happy to give it a try.
this has on Python (in terms of speed and memory usage and whether it breaks any of the regression tests). Once you've satisfied yourself that it works, submit a patch, and post here again with the SF number.
Ok.
Note that since your tunable parameter is presumably accessible from Python code, you'll also need to submit doc patches and tests to verify that it's working correctly.
Ok. If there's anybody willing to suggest the files to hook into (the location where the interpreter allocates all anonymous memory) and how to invoke gc.collect() from C, that would help. thanks!
Andrea Arcangeli wrote:
If there's anybody willing to suggest the files to hook into (the location where the interpreter allocates all anonymous memory) and how to invoke gc.collect() from C, that would help. thanks!
It all happens in Modules/gcmodule.c:_PyObject_GC_Malloc. There are per-generation counters; _PyObject_GC_Malloc increments the generation 0 counter, and PyObject_GC_Del decreases it. The counters of the higher generations are incremented when a lower collection occurs. One challenge is that PyObject_GC_Del doesn't know how large the memory block is that is being released. So it is difficult to find out how much memory is being released in the collection. Regards, Martin
Martin v. Löwis
One challenge is that PyObject_GC_Del doesn't know how large the memory block is that is being released. So it is difficult to find out how much memory is being released in the collection.
Another idea would be to add accounting to the PyMem_* interfaces. It could be that most memory is used by objects that are not tracked by the GC (e.g. strings). I guess you still have the same problem in that PyMem_Free may not know how large the memory block is. Neil
[Martin v. Löwis]
... One challenge is that PyObject_GC_Del doesn't know how large the memory block is that is being released. So it is difficult to find out how much memory is being released in the collection.
"Impossible in some cases" is accurate. When pymalloc isn't enabled, all these things call the platform malloc/free directly, and there's no portable/standard way to find out anything from those. When pymalloc is enabled, PyObject_GC_Del could be taught whether pymalloc controls the block being freed, and, when so, how to suck up the block's size index from the block's pool header; but when pymalloc doesn't control the memory being freed, it's the same as if pymalloc were not enabled. [Neil Schemenauer]
Another idea would be to add accounting to the PyMem_* interfaces. It could be that most memory is used by objects that are not tracked by the GC (e.g. strings).
I still expect this old code in pymem.h to go away for Python 2.5: /* In order to avoid breaking old code mixing PyObject_{New, NEW} with PyMem_{Del, DEL} and PyMem_{Free, FREE}, the PyMem "release memory" functions have to be redirected to the object deallocator. */ #define PyMem_FREE PyObject_FREE When goes away, PyMem_FREE will resolve directly to the platform free(), and will no longer have even accidental relationships to any memory involved in cyclic gc.
I guess you still have the same problem in that PyMem_Free may not know how large the memory block is.
It will be more the case that we can guarantee it won't know -- but since direct uses of malloc/free have no useful relationship to cyclic gc behavior, the OP shouldn't care about that. In any case, the OP's original "the overhead of this should be zero" claim isn't credible (I checked, and there _still_ won't be free lunches in 2006 -- unless you work at Google ;-)).
Andrea Arcangeli wrote:
To implement this we need two hooks, in the malloc and free that allocate python objects. Then we have to store the minimum of this value (i.e. the last minimum of memory allocated by the interpreter).
I would like to underline Aahz' comment: it is unlikely that anything will happen about this unless you make it happen. This specific problem is not frequent, and the current strategy (collect if 1000 new objects are allocated) works fine for most people. So if you want a change, you should really consider comming up with a patch yourself. Bonus points if the code integrates with the current strategies, instead of replacing them. Regards, Martin
On Wed, Dec 28, 2005 at 03:32:29PM +0100, "Martin v. Löwis" wrote:
you should really consider comming up with a patch yourself. Bonus points if the code integrates with the current strategies, instead of replacing them.
As I wrote in the first email, I've no intention to replace anything. The new heuristic would be completely orthogonal to the current strategy. Current strategy is in function of the number, the new heuristic would be in function of size, and they would co-exist perfectly.
On Dec 27, 2005, at 9:05 AM, Andrea Arcangeli wrote:
I run into a problem recently with a reconnectingclientfactory with twisted while write some spare time software, that turned out to be a gc inefficiency.
In short the protocol memory wasn't released after the reconnect and the protocol had about 50M attached to it. So with frequent reconnects the rss of the task grown to >1G and it pushed the system into heavy swap. In reality only 50M were allocated, so 950M were wasted by python.
In this particular case, you might be better off just writing some Twisted code that periodically checks the size of the current process and does a gc.collect() when necessary. Of course, it requires some platform specific code, but presumably you only care about one, maybe two, platforms anyway. -bob
On Thu, Dec 29, 2005 at 04:22:35AM -0500, Bob Ippolito wrote:
In this particular case, you might be better off just writing some Twisted code that periodically checks the size of the current process and does a gc.collect() when necessary. Of course, it requires some platform specific code, but presumably you only care about one, maybe two, platforms anyway.
In function of time != in function of size. The timer may trigger too late. And anyway the point was to do it in autopilot mode, I already fixed my app with a gc.collect() after releasing the huge piece of memory. I'll try to write a testcase for it, that if python would be doing what I suggest, wouldn't push a system into heavy swap.
participants (6)
-
"Martin v. Löwis"
-
Aahz
-
Andrea Arcangeli
-
Bob Ippolito
-
Neil Schemenauer
-
Tim Peters