[Python-Dev] Re: Another test_compiler mystery

Tue Aug 10 08:48:15 CEST 2004

Well, this gets nasty.

In a debug build, and starting the loop at 0, I can't get off the
gound in the MS 7.1 debugger.  It dies quickly with an access
violation in the bowels of ntdll.dll, and I don't have source for
that.

PyOS_CheckStack on Windows does this to detect stack overflow (it
catches an MS exception if the C runtime can't allocate enough room on
the stack):

	alloca(PYOS_STACK_MARGIN * sizeof(void*));

It's trying to see whether there's still room for 2K pointers on the
stack.  If I multiply that by 2, or by 3, nothing changes.  But if I
multiply it by 4, everything changes.  Then the "oops!  we're gonna
blow the stack!" exit from PyOS_CheckStack is taken.  It returns 1 to
_Py_CheckRecursiveCall, which sets a "stack overflow" MemoryError and
returns -1 to its caller.

That's

	if (Py_EnterRecursiveCall(" in cmp"))
		return NULL;

in PyObject_RichCompare.  That's just trying to compare two ints.

So NULL gets returned to PyObject_RichCompareBool, which in turn
returns -1 to lookdict.  AAAARGHGH!  lookdict "isn't allowed" to raise
exceptions, so it does a PyErr_Clear(), goes on to futilely chase the
entire dict looking for another match on the hash code, and we've
effectively turned a MemoryError into a KeyError.  I expect that
explains a lot about what we see in the release-build runs.

If I multiply the stack check by 20, I can finally get some results
out of the debug build:

0 exceptions.KeyError 299
1 exceptions.MemoryError Stack overflow
2 exceptions.MemoryError Stack overflow
3 exceptions.MemoryError Stack overflow
4 exceptions.KeyError 295
5 exceptions.MemoryError Stack overflow
6 exceptions.KeyError 294
7 exceptions.MemoryError Stack overflow
8 exceptions.MemoryError Stack overflow
9 exceptions.MemoryError Stack overflow
10 exceptions.MemoryError Stack overflow
11 exceptions.MemoryError Stack overflow
12 exceptions.MemoryError Stack overflow
13 exceptions.MemoryError Stack overflow
14 exceptions.KeyError 309
15 exceptions.KeyError 296
...

So we're blowing the C stack left and right in this test case, and
sometimes dict lookup turns that into a KeyError.

The question is what we did since 2.3.4 that apparently increases our
stack demands, and grossly increases them in a debug build(!).  Could
be that the compile package is more heavily recursive now too (no
idea).  test_parser.py in 2,3.4 contained the same deeply nested
tuples, so that's not what changed.

Back in a release build, and restoring the original Windows
stack-check code, but leaving the driver loop starting at 0, I have to
sys.setrecursionlimit(16) to avoid getting any KeyErrors. 
sys.setrecursionlimit(878) is the minimum that allows at least one
"ok" to show up:

0 ok
1 exceptions.RuntimeError maximum recursion depth exceeded
2 exceptions.RuntimeError maximum recursion depth exceeded
3 exceptions.RuntimeError maximum recursion depth exceeded
4 exceptions.KeyError 307
5 exceptions.RuntimeError maximum recursion depth exceeded
6 exceptions.KeyError 306
7 exceptions.RuntimeError maximum recursion depth exceeded
8 exceptions.KeyError 305
...

Misc/find_recursionlimit.py in CVS manages to print

    Limit of 1000 is fine

before it craps out in a release build; in a debug build, it doesn't
produce *any* output.  If I change the limit it starts with to 100, it
manages to get to

   Limit of 400 is fine

in a debug build before stopping without a clue.

Hmm!  But over in 2.3.4 build, a release build also stopped with 1000,
and a debug build also exited mysteriously.  But after reducing its
starting point to 100, it got to

    Limit of 700 is fine

before crapping out.

BTW, in 2.3.4 and CVS, when a debug run craps out mysteriously like
this, it has an exit code of 128.  That's scary:

    http://support.microsoft.com/support/kb/articles/q184/8/02.asp