cross platform memory usage high water mark for improved gc?

Hi. Some months ago I complained on the python-list that python gc did too much work for apps that allocate and deallocate lots of structures. In fact one of my apps was spending about 1/3 of its time garbage collecting and not finding anything to collect (before i disabled gc). My proposal was that python should have some sort of a smarter strategy for garbage collection, perhaps involving watching the global high water mark for memory allocation or other tricks. The appropriate response was: "great idea! patch please!" :) Unfortunately dealing with cross platform memory management internals is beyond my C-level expertise, and I'm not having a lot of luck finding good information sources. Does anyone have any clues on this or other ideas for improving the gc heuristic? For example, how do you find out the allocated heap size(s) in a cross platform way? This link provides some clues, but I don't really understand this code well enough to hope to patch gc. http://www.xfeedme.com/nucular/pydistro.py/go?FocusId=74&FREETEXT=high%20water%20mark -- Aaron Watters

On Mon, Mar 10, 2008 at 8:33 AM, Aaron Watters <aaron.watters@gmail.com> wrote:
You can of course tweak gc.set_threshold() (and I would expect this to be quite effective, once you find out what an appropriate threshold0 is for your app.) I don't believe you'll find any existing counters of the current heap size though (be it number of allocated objects or total size consumed by those objects.) -- Adam Olsen, aka Rhamphoryncus

On Mon, Mar 10, 2008 at 12:30 PM, Adam Olsen <rhamph@gmail.com> wrote:
It would be nice if the threshold would adjust based on the performance characteristics of the app. In particular it'd be nice if the garbage collector would notice when it's never finding anything and wait longer everytime it finds nothing for the next collection attempt. How about this. - The threshold slides between minimumThresh and maximumThresh - At each collection the current number of objects collected is compared to the last number collected (collectionTrend). - If the collectionTrend is negative or zero the next threshold slides towards the maximum. - If the collectionTrend is a small increase, the threshold stays the same. - If the collectionTrend is a large increase the next threshold slides towards the minimum. That way for apps that need no garbage collection (outside of refcounting) the threshold would slide to the maximum and stay there, but for apps that need a lot of gc the threshold would bounce up and down near the minimum. This is almost easy enough that I could implement it... -- Aaron Watters === http://www.xfeedme.com/nucular/pydistro.py/go?FREETEXT=stupid+animation

...
- If the collectionTrend is a small increase, the threshold stays the same.
footnote: for stability you would not update the "last collection count" in this case so the comparison is always against a fixed point until the threshold adjusts.... -- Aaron Watters === http://www.xfeedme.com/nucular/pydistro.py/go?FREETEXT=arggh

On Tue, Mar 11, 2008 at 7:28 AM, Aaron Watters <aaron.watters@gmail.com> wrote:
It sounds plausible to me. But have you tried just tweaking the threshold? Surely there's a value at which it performs well, and that'd need to be within your maximum anyway. -- Adam Olsen, aka Rhamphoryncus

In that case it works best when gc is disabled. If I add a new feature, the gc requirements may change completely without me realizing it. I'm interested in not having to think about it :). -- Aaron Watters === http://www.xfeedme.com/nucular/pydistro.py/go?FREETEXT=hello+there+goof+ball

On Tue, Mar 11, 2008 at 11:20 AM, Aaron Watters <aaron.watters@gmail.com> wrote:
You're concerned that a new feature may increase how high of a threshold you need, yet it could also exceed the "maximum" of your adaptive scheme. I'm not convinced you need that high of a threshold anyway. I'd like to see a benchmark showing how your app performs at different levels. -- Adam Olsen, aka Rhamphoryncus

You are absolutely right that I can set a threshold high enough. With the default values Python is extremely slow for certain cases. I'm arguing it should automatically detect when it is being stupid and attempt to fix it. In particular I would set the maximum very high and start at the minimum, which might be near the current defaults. For example I get the following in a simple test (python2.6):
In this case the interpreter is spending 80% of its time trying to collect non-existent garbage. Now a newbie who didn't know to go fiddling with the garbage collector might just conclude "python is ssslllooowwww" and go back to using Perl or Ruby or whatever in a case like this. Maybe the powers that be couldn't care less about it, I don't know. (I know newbies can be irritating). The problem is quadratic also: if I double the limit the penalty goes up by a factor of 4. Here is the source: def test(disable=False, limit=1000000): from time import time import gc if disable: gc.disable() print "gc disabled" else: print "gc not disabled" now = time() D = {} for i in range(limit): D[ (hex(i), oct(i)) ] = str(i)+repr(i) L = [ (y,x) for (x,y) in D.iteritems() ] elapsed = time()-now print "elapsed", elapsed if __name__=="__main__": import sys disable = False if "disable" in sys.argv: disable = True test(disable) -- Aaron Watters === http://www.xfeedme.com/nucular/pydistro.py/go?FREETEXT=being+anal

On Tue, Mar 11, 2008 at 1:57 PM, Aaron Watters <aaron.watters@gmail.com> wrote:
Interesting. With some further testing, it's become clear that the problem is in gen2. gen0 and gen1 both add a constant overhead (their size is bounded), but gen2's size grows linearly, and with a linear number of scans that gives quadratic performance. I'm unsure how to best fix this. Anything we do will effectively disable gen2 for short-running programs, unless they do the right thing to trigger the heuristics. Long running programs have a little more chance of triggering them, but may do so much later than desirable. Something must be done though. The costs should be linear with time, not quadratic. The frequency at which an object gets scanned should be inversely proportional to the number of objects to be scanned. -- Adam Olsen, aka Rhamphoryncus

Aaron Watters wrote:
Have you read the code and comments in Modules/gcmodule.c? The cyclic GC has three generations. A gc sweep for the highest generation is started every 70,000 instructions. You can tune the levels for the generations yourself through the gc module set threshold function. Christian

On Tue, Mar 11, 2008 at 12:57 PM, Christian Heimes <lists@cheimes.de> wrote:
Not instructions. There's a counter that's incremented on allocation and decremented on deallocation. Each time it hits 700 it triggers a collection. The collections are normally only gen0, but after 10 it does a gen1 collection. After 10 of the second generation it does the gen2 (ie a full collection.) Although, given the way the math is done, I think the 701st object will be the one that triggers the gen0 collection, and the 12th time that happens it does gen1 instead. I get a grand total of.. 93233 objects to trigger a gen2 collection. Not that it matters. Without more detail information on what the app is doing we can't seriously attempt to improve the heuristics for it. -- Adam Olsen, aka Rhamphoryncus

On Mon, Mar 10, 2008 at 8:33 AM, Aaron Watters <aaron.watters@gmail.com> wrote:
You can of course tweak gc.set_threshold() (and I would expect this to be quite effective, once you find out what an appropriate threshold0 is for your app.) I don't believe you'll find any existing counters of the current heap size though (be it number of allocated objects or total size consumed by those objects.) -- Adam Olsen, aka Rhamphoryncus

On Mon, Mar 10, 2008 at 12:30 PM, Adam Olsen <rhamph@gmail.com> wrote:
It would be nice if the threshold would adjust based on the performance characteristics of the app. In particular it'd be nice if the garbage collector would notice when it's never finding anything and wait longer everytime it finds nothing for the next collection attempt. How about this. - The threshold slides between minimumThresh and maximumThresh - At each collection the current number of objects collected is compared to the last number collected (collectionTrend). - If the collectionTrend is negative or zero the next threshold slides towards the maximum. - If the collectionTrend is a small increase, the threshold stays the same. - If the collectionTrend is a large increase the next threshold slides towards the minimum. That way for apps that need no garbage collection (outside of refcounting) the threshold would slide to the maximum and stay there, but for apps that need a lot of gc the threshold would bounce up and down near the minimum. This is almost easy enough that I could implement it... -- Aaron Watters === http://www.xfeedme.com/nucular/pydistro.py/go?FREETEXT=stupid+animation

...
- If the collectionTrend is a small increase, the threshold stays the same.
footnote: for stability you would not update the "last collection count" in this case so the comparison is always against a fixed point until the threshold adjusts.... -- Aaron Watters === http://www.xfeedme.com/nucular/pydistro.py/go?FREETEXT=arggh

On Tue, Mar 11, 2008 at 7:28 AM, Aaron Watters <aaron.watters@gmail.com> wrote:
It sounds plausible to me. But have you tried just tweaking the threshold? Surely there's a value at which it performs well, and that'd need to be within your maximum anyway. -- Adam Olsen, aka Rhamphoryncus

In that case it works best when gc is disabled. If I add a new feature, the gc requirements may change completely without me realizing it. I'm interested in not having to think about it :). -- Aaron Watters === http://www.xfeedme.com/nucular/pydistro.py/go?FREETEXT=hello+there+goof+ball

On Tue, Mar 11, 2008 at 11:20 AM, Aaron Watters <aaron.watters@gmail.com> wrote:
You're concerned that a new feature may increase how high of a threshold you need, yet it could also exceed the "maximum" of your adaptive scheme. I'm not convinced you need that high of a threshold anyway. I'd like to see a benchmark showing how your app performs at different levels. -- Adam Olsen, aka Rhamphoryncus

You are absolutely right that I can set a threshold high enough. With the default values Python is extremely slow for certain cases. I'm arguing it should automatically detect when it is being stupid and attempt to fix it. In particular I would set the maximum very high and start at the minimum, which might be near the current defaults. For example I get the following in a simple test (python2.6):
In this case the interpreter is spending 80% of its time trying to collect non-existent garbage. Now a newbie who didn't know to go fiddling with the garbage collector might just conclude "python is ssslllooowwww" and go back to using Perl or Ruby or whatever in a case like this. Maybe the powers that be couldn't care less about it, I don't know. (I know newbies can be irritating). The problem is quadratic also: if I double the limit the penalty goes up by a factor of 4. Here is the source: def test(disable=False, limit=1000000): from time import time import gc if disable: gc.disable() print "gc disabled" else: print "gc not disabled" now = time() D = {} for i in range(limit): D[ (hex(i), oct(i)) ] = str(i)+repr(i) L = [ (y,x) for (x,y) in D.iteritems() ] elapsed = time()-now print "elapsed", elapsed if __name__=="__main__": import sys disable = False if "disable" in sys.argv: disable = True test(disable) -- Aaron Watters === http://www.xfeedme.com/nucular/pydistro.py/go?FREETEXT=being+anal

On Tue, Mar 11, 2008 at 1:57 PM, Aaron Watters <aaron.watters@gmail.com> wrote:
Interesting. With some further testing, it's become clear that the problem is in gen2. gen0 and gen1 both add a constant overhead (their size is bounded), but gen2's size grows linearly, and with a linear number of scans that gives quadratic performance. I'm unsure how to best fix this. Anything we do will effectively disable gen2 for short-running programs, unless they do the right thing to trigger the heuristics. Long running programs have a little more chance of triggering them, but may do so much later than desirable. Something must be done though. The costs should be linear with time, not quadratic. The frequency at which an object gets scanned should be inversely proportional to the number of objects to be scanned. -- Adam Olsen, aka Rhamphoryncus

Aaron Watters wrote:
Have you read the code and comments in Modules/gcmodule.c? The cyclic GC has three generations. A gc sweep for the highest generation is started every 70,000 instructions. You can tune the levels for the generations yourself through the gc module set threshold function. Christian

On Tue, Mar 11, 2008 at 12:57 PM, Christian Heimes <lists@cheimes.de> wrote:
Not instructions. There's a counter that's incremented on allocation and decremented on deallocation. Each time it hits 700 it triggers a collection. The collections are normally only gen0, but after 10 it does a gen1 collection. After 10 of the second generation it does the gen2 (ie a full collection.) Although, given the way the math is done, I think the 701st object will be the one that triggers the gen0 collection, and the 12th time that happens it does gen1 instead. I get a grand total of.. 93233 objects to trigger a gen2 collection. Not that it matters. Without more detail information on what the app is doing we can't seriously attempt to improve the heuristics for it. -- Adam Olsen, aka Rhamphoryncus
participants (3)
-
Aaron Watters
-
Adam Olsen
-
Christian Heimes