Troubleshooting garbage collection issues
Rhamphoryncus
rhamph at gmail.com
Sun Nov 18 18:07:09 EST 2007
On Nov 17, 10:34 am, "davemer... at gmail.com" <davemer... at gmail.com>
wrote:
> Hi folks - wondering if anyone has any pointers on troubleshooting
> garbage collection. My colleagues and I are running into an
> interesting problem:
>
> Intermittently, we get into a situation where the garbage collection
> code is running in an infinite loop. The data structures within the
> garbage collector have been corrupted, but it is unclear how or why.
> The problem is extremely difficult to reproduce consistently as it is
> unpredictable.
>
> The infinite loop itself occurs in gcmodule.c, update_refs. After
> hitting this in the debugger a couple of times, it appears that that
> one of the nodes in the second or third generation list contains a
> pointer to the first generation head node. The first generation was
> cleared shortly before the call into this function, so it contains a
> prev and next which point to itself. Once this loop hits that node,
> it spins infinitely.
>
> Chances are another module we're depending on has done something
> hinkey with GC. The challenge is tracking that down. If anyone has
> seen something like this before and has either pointers to specific GC
> usage issues that can create this behavior or some additional thoughts
> on tricks to track it down to the offending module, they would be most
> appreciated.
>
> You can assume we've done some of the "usual" things - hacking up
> gcmodule to spit information when the condition occurs, various
> headstands and gymnastics in an attempt to identify reliable steps to
> reproduce - the challenge is the layers of indirection that we think
> are likely present between the manifestation of the problem and the
> module that produced it.
Does "usual things" also include compiling with --with-pydebug?
You could also try the various memory debuggers. A refcounting error
is the first thing that comes to mind, although I can't see off hand
how this specific problem would come about.
Are you using threading at all?
Do you see any pattern to the types that have the bogus pointers?
--
Adam Olsen, aka Rhamphoryncus
More information about the Python-list
mailing list