(replying to Neil & Tim)
--- Tim Peters email@example.com wrote:
[Neil Schemenauer, on Daniel Dunbar's interesting tuple-untrack scheme]
I'm not sure I like this approach. Slowing down PyTuple_SET_ITEM to optimize a micro-benchmark seems dubious. However, dropping the number of tracked objects by 50% _does_ seem worthwhile. Couldn't we get the
That was basically the opinion I reached as well - changing PyTuple_SET_ITEM seems nasty in principle, and the performance results were dubious at best... the 50% drop in tracked objects also seemed like the most interesting thing to me, however, its not clear that that percentage would remain under the dynamic conditions of a real application.
same effect by checking and untracking the appropriate tuples after finishing a gen1 collection? Each tuple would be checked once and short lived tuples will probably not be checked at all.
The other thing to note is that untracking such tuples is a very small win regardless - if they were not untracked the tuple visit routine will quickly scan down the tuple (which I guess is likely to be small to start with), and the GC visitor will quickly reject adding any of the contained objects because they are not PyObject_IS_GC().
That suggests that the only case it would make a difference is a) when there are soooo many untrackable tuples that simply traversing them takes a large amount of time (ie. the iterzip case), or b) when there is a massive nesting of untrackable tuples - an large AST or binary-tree might benefit from this, but there are perhaps better solutions to that problem (see below)
< snip - Tim talks about issues with removing tuples in the collector >
I still don't understand how you 'check the tuple' and then untrack it - even if it has no NULL slots it still isn't safe to remove.
I don't think I agree that special casing tuples in the collector is a good idea - this whole issue just came up because of the iterzip test, where it was noticed that the collector really didn't have to check all the objects, which is just a special case solution to the problem that many of the objects in the gen* lists are not going to be participating in a cycle.
A solution to that general problem would be much more beneficial, although presumably much more work ;)
If the collector could somehow prioritize objects, it seems like various heuristics could be used to gauge the likelyhood of an object participating in a cycle vs. the 'good' of collecting that cycle.
As an aside: are there some nice established tests for estimating the benefit of modifications to the collector? Something that needs the collector (so getting cycles early keeps the memory consumption down, improving performance) and also has dynamic object allocation properties that mimic some ideal of real-applications?
===== daniel dunbar firstname.lastname@example.org
Do You Yahoo!? Yahoo! Health - your guide to health and wellness http://health.yahoo.com