[Python-Dev] Lazily GC tracking tuples
Daniel Dunbar
evilzr@yahoo.com
Wed, 8 May 2002 08:32:52 -0700 (PDT)
(replying to Neil & Tim)
--- Tim Peters <tim_one@email.msn.com> wrote:
> [Neil Schemenauer, on Daniel Dunbar's interesting tuple-untrack scheme]
> > I'm not sure I like this approach. Slowing down PyTuple_SET_ITEM to
> > optimize a micro-benchmark seems dubious. However, dropping the number
> > of tracked objects by 50% _does_ seem worthwhile. Couldn't we get the
That was basically the opinion I reached as well - changing
PyTuple_SET_ITEM seems nasty in principle, and the performance
results were dubious at best... the 50% drop in tracked objects
also seemed like the most interesting thing to me, however, its
not clear that that percentage would remain under the dynamic
conditions of a real application.
> > same effect by checking and untracking the appropriate tuples after
> > finishing a gen1 collection? Each tuple would be checked once and short
> > lived tuples will probably not be checked at all.
The other thing to note is that untracking such tuples is a
very small win regardless - if they were not untracked the
tuple visit routine will quickly scan down the tuple (which
I guess is likely to be small to start with), and the GC
visitor will quickly reject adding any of the contained
objects because they are not PyObject_IS_GC().
That suggests that the only case it would make a difference
is a) when there are soooo many untrackable tuples that
simply traversing them takes a large amount of time (ie.
the iterzip case), or b) when there is a massive nesting
of untrackable tuples - an large AST or binary-tree might
benefit from this, but there are perhaps better solutions
to that problem (see below)
< snip - Tim talks about issues with removing tuples in the collector >
I still don't understand how you 'check the tuple' and then
untrack it - even if it has no NULL slots it still isn't safe
to remove.
I don't think I agree that special casing tuples in the
collector is a good idea - this whole issue just came
up because of the iterzip test, where it was noticed
that the collector really didn't have to check all
the objects, which is just a special case solution to
the problem that many of the objects in the gen*
lists are not going to be participating in a cycle.
A solution to that general problem would be much more
beneficial, although presumably much more work ;)
If the collector could somehow prioritize objects,
it seems like various heuristics could be used to
gauge the likelyhood of an object participating in
a cycle vs. the 'good' of collecting that cycle.
As an aside: are there some nice established tests
for estimating the benefit of modifications to
the collector? Something that needs the collector
(so getting cycles early keeps the memory consumption
down, improving performance) and also has dynamic
object allocation properties that mimic some ideal
of real-applications?
=====
daniel dunbar
evilzr@yahoo.com
__________________________________________________
Do You Yahoo!?
Yahoo! Health - your guide to health and wellness
http://health.yahoo.com