[Python-Dev] Lazily GC tracking tuples

Kevin Jacobs jacobs@penguin.theopalgroup.com
Mon, 3 Jun 2002 21:52:31 -0400 (EDT)


On Mon, 3 Jun 2002, Tim Peters wrote:
> [Kevin Jacobs]
> > I didn't count it as a strike against the patch -- I had just hoped that
> > untracking tuples would result in faster execution than turning GC off and
> > letting my heap swell obscenely.
> 
> Kevin, keep in mind that we haven't run your app:  the only things we know
> about it are what you've told us.  That your heap swells obscenely when gc
> is off is news to me. *Does* your heap swell obscenely when turning gc off,
> but does not swell obscenely if you keep gc on?  If so, you should keep in
> mind too that the best way to speed gc is not to create cyclic trash to
> begin with.

This part of my app allocates a mix of several kinds of objects.  Some are
cyclic tuple trees, many are acyclic tuples, and others are complex class
instances.  To make things worse, the object lifetimes vary from ephemeral
to very long-lived.  Due to this mix, turning GC off *does* cause the heap
to swell, and we have to very carefully monitor the algorithm to relieve the
swelling frequently enough to prevent eating too much system memory, but
infrequently enough that we don't spend all of our time in the GC tracking
through the many acyclic tuples.  Also, due to the dynamic nature of the
algorithm, it is not trivial to avoid creating cyclic trash.

> > One extreme case could happend if I turn off GC, run my code, and let it
> > fill all of my real memory with tuples, and start swapping to disk.
> > Clearly, keeping GC enabled with the tuple untracking patch would result
> > in huge performance gains.
> 
> Sorry, this isn't clear at all.  If your tuples aren't actually in cycles,
> then whether GC is on or off is irrelevant to how long they live, and to how
> much memory they consume.  It doesn't cost any extra memory (not even one
> byte) for a tuple to live in a gc list; on the flip side, no memory is saved
> by Neil's patch.

But I do generate cyclic trash, and it does build up when I turn off GC.
So, if it runs long enough with GC turned off, the machine will run out of
real memory.  This is all that I am saying.

> > This is not the situation I was dealing with, though I was hoping for a
> > relatively smaller improvement from having a more compact and (hopefully)
> > less fragmented heap.
> 
> Neil's patch should have no effect on memory use or fragmentation.  It's
> only aiming at reducing the *time* spent uselessly scanning and rescanning
> and rescanning and ... tuples in bad cases.  So long as your tuples stay
> alive, they're going to consume the same amount of memory with or without
> the patch, and whether or not you disable gc.

With Neil's patch and GC turned ON:

  Python untracks many, many tuples that give for several generations that
  would otherwise have to scanned to dispose of the accumulating cyclic
  garbage.  Some GC overhead is observed, due to normal periodic scanning
  and the extra work needed to untrack acyclic tuples.

without Neil's patch and automatic GC turned OFF:

  Python does not spend any time scanning for garbage, and things are very
  fast until we run out of core memory, or the patterns of allocations
  result in an extremely fragmented heap due to interleaved allocation of
  objects with very heterogeneous lifetimes.  At certain intervals, GC would
  be triggered manually to release the cyclic trash.

My hope was that the more compact and hopefully less fragmented heap would
more than offset the overhead of automatic GC scans and the tuple untracking
sweep.

Hopefully my comments are now a little clearer...

-Kevin

--
Kevin Jacobs
The OPAL Group - Enterprise Systems Architect
Voice: (216) 986-0710 x 19         E-mail: jacobs@theopalgroup.com
Fax:   (216) 986-0714              WWW:    http://www.theopalgroup.com