[Python-Dev] C API for gc.enable() and gc.disable()

Sat Jun 21 23:36:00 CEST 2008

"Martin v. Löwis" writes:

 > Given the choice of "run slower" and "run out of memory", Python should
 > always prefer the former.
 > 
 > One approach could be to measure how successful a GC run was: if GC
 > finds that more-and-more objects get allocated and very few (or none)
 > are garbage, it might conclude that this is an allocation spike, and
 > back off. The tricky question is how to find out that the spike is
 > over.

XEmacs implements this strategy in a way which is claimed to give
constant amortized time (ie, averaged over memory allocated).  I
forget the exact parameters, but ISTR it's just period ("time"
measured by bytes allocated) is proportional to currently allocated
memory.  Some people claim this is much more comfortable than the
traditional "GC after N bytes are allocated" algorithm but I don't
notice much difference.  I don't know how well this intuition carries
over to noninteractive applications.

In XEmacs experimenting with such strategies is pretty easy, since the
function that determines period is only a few lines long.  I assume
that would be true of Python, too.

However, isn't the real question whether there is memory pressure or
not?  If you've got an unloaded machine with 2GB of memory, even a 1GB
spike might have no observable consequences.  How about a policy of
GC-ing with decreasing period ("time" measured by bytes allocated or
number of allocations) as the fraction of memory used increases,
starting from a pretty large fraction (say 50% by default)?  The total
amount of memory could be a soft limit, defaulting to the amount of
fast memory actually available.

For interactive and maybe some batch applications, it might be
appropriate to generate a runtime warning as memory use approches some
limits, too.

Nevertheless, I think the real solution has to be for Python
programmers to be aware that there is GC, and that they can tune it.