[Python-Dev] Big trouble in CVS Python

Tim Peters tim_one@email.msn.com
Sun, 13 Apr 2003 18:07:05 -0400


[Jeremy Hylton]
> We've had a lot of changes to the function call implementation over the
> last couple of months.  What's the chance that this is just the first
> time we've noticed the problem?

Slim, I think -- anything systematically screwing up refcounts on calls
would have lots of opportunities to create trouble.  This one was unique and
shy.

> Seems pretty plausible that the recent GC changes just exposed an
> earlier bug.

For all the code changes, the only intended semantic difference was in
has_finalizer's implementation details.  So that didn't seem likely either.

Turned out that the damaged co_consts was attached to the test that
exercised the new C code at fault.  The code was compiled gazillions of
cycles before the test was executed, though, and gazillions more cycles
passed before GC bumped into the damage.  If gc hadn't bumped into it, the
memory would have gotten allocated to some other float, and then would have
been decref'ed incorrectly when the original co_consts got deallocated.  So
it *could* have been much harder to track down <shudder>.

What I still don't grasp is why a debug run never failed with a
negative-refcount error.  Attaching the prematurely-freed float to the float
free list doesn't change its refcount field -- that remains 0.  So if it was
still in the free list when the original co_consts got reclaimed, we should
have had a negrefcnt death.  OTOH, if the memory was handed out to another
float, then when the original co_consts got reclaimed it would have knocked
that float's refcount down too, which should lead to a negrefcnt death
later.  Maybe co_consts never did get reclaimed?  I'm not clear on how much
we let slide at shutdown.