06 Apr 2003 23:45:05 -0400
On Sun, 2003-04-06 at 20:47, Tim Peters wrote:
> BTW, I'm still wondering why the ZODB thread test failed the way it did for
> Tres and Barry and me: you saw corrupt gc lists, but the rest of us never
> did. We saw a Connection instance with a mysteriously cleared __dict__.
> That's consistent with the __getattr__-hook-resurrects-an-
> object-reachable-only-from-an-unreachable-cycle examples I posted, but did
> you guys figure out on Friday whether that's what was actually happening?
> The corrupt-gc-lists symptom was explained by the __getattr__ hook deleting
> unreachable objects while gc was still crawling over them, and that's a
> different (albeit related) problem than __dicts__ getting cleared by magic.
[Note to everyone else, there's a lot of ZODB-specific detail in the
answer. It might not be that interesting beyond ZODB developers.]
The __getattr__ code in ZODB made a large cycle of objects reachable
again. The __getattr__ hook called a method on a ZODB Connection and
the Connection registered itself with the current transaction
(basically, a global resource). Then the Connection got tp_cleared by
the garbage collector. Now the Connection is a zombie but it's also
registered with a transaction. When the transaction commits or aborts,
the code failed because the Connection didn't have any attributes.
I got particularly lucky with my compiler/platform/Python
version/whatever. Part of the code in __getattr__ deleted a key-value
pair from a dictionary. I think that was partly chance; there was
nothing about the code that guaranteed the key was in the dict, but it
deleted it if it was. The value in the dict was a weakref. The weakref
decrefed and deallocated its callback function. Just by luck, the
callback function was the next thing in the unreachable gc list. So I
got a segfault when I dereferenced the now-freed GC header of the