I checked in a surprisingly large patch to change the way we collect a generation.
Before: Things directly reachable from outside the young generation were moved into a 'reachable' set in one pass. Things indirectly reachable from outside the young generation were moved into 'reachable' in a second pass. The 'young' list then contained the unreachable objects.
After: Things unreachable from outside the young generation are moved into an 'unreachable' set in one pass. The 'young' list contains the reachable objects when that's done.
The point was that almost everything is reachable in the end, and moving an object between lists costs six pointer stores (updating prev and next pointers in the object, and in each of the two lists). So if most stuff is doomed to be reachable in the end, better to move the unreachable stuff than to move the reachable stuff.
This seems to be a nice little win. If you want to know more, read the comments. An instrumented version showed this over a run of the Python test suite:
scanned 7437363 moved 36854 movedback 34389
scanned # of times the loop in move_unreachable() went around == # of objects moved # of times "if (gc->gc.gc_refs == 0)" in move_unreachable() triggered == # of objects moved into an unreachable set movedback # of times "if (gc_refs == GC_TENTATIVELY_UNREACHABLE)" in visit_reachable() triggered == the number of times move_unreachable() guessed wrong and an object had to be moved back into a reachable set
So the change saved about 7e6 object moves here, for 6x as many pointer stores, and gc is finding very little that's unreachable in the end.
Surprisingly, the worst (for some technical meaning of "worst" I'll leave to your imagination) stats I've seen came out of running Zope3's test suite:
scanned 649444 moved 56124 movedback 43576
It's surprising for lots of reasons, including how relatively little work gc is doing in total, and how relatively much was found to be unreachable (about 12 thousand objects). The latter is surprising because Zope code has traditionally tried like heck not to create cycles.
The Python test suite *almost* gave "the best" (most favorable to the change) stats I've seen. Only a variant of Kevin Jacobs's little test case looked better so far:
scanned 12322200 moved 244 movedback 244
It would be nicer if we could drive scanned there down to 0 <wink>.