[Python-Dev] Fun with 2.3 shutdown

Tim Peters tim.one at comcast.net
Fri Sep 19 15:11:14 EDT 2003


[Tim]
> When the Zope3 tests are run under Python 2.3, after the test runner
> ends we usually get treated to a long string of these things:
>
> """
> Unhandled exception in thread started by
> Error in sys.excepthook:
> Original exception was:
>
> """

[bunch of analysis deleted]

> ...
> Event.wait() with a timeout ends up in _Condition.wait(), where a
> lazy busy loop wakes up about 20 times per second to see whether it
> can proceed.
>
> For some reason an exception is getting raised in the wait() code.
> I'm not exactly sure what or why yet, but that will come soon enough.

It didn't.  The primary effect of adding some vanilla debugging prints to
threading.py's _Condition.wait() was to make Python die with segfaults at
shutdown time instead.

If Python's *second* call to PyGC_collect() in Py_Finalize() is commented
out (the call that occurs after

	/* Destroy all modules */
	PyImport_Cleanup();

), all the problems go away, including the nonsense errors sprayed out at
the end of Zope3 test runs.

Recall that the nonsense errors are caused by a dozen stale daemon threads
trying to execute Python code after the interpreter has been severely torn
down, and they get the *chance* to do this because the second PyGC_collect()
finds trash with Python __del__ methods (so PyGC_collect() loses the GIL
when calling the __del__ methods, and all the daemon threads can proceed
then).  Note that this isn't a problem with code *in* __del__ methods!  That
makes it a different kind of shutdown glitch than we've usually wrestled
with.  It's a problem with Python code that has nothing to do with __del__;
__del__'s only contribution is to release the GIL.

Because the second PyGC_collect() *is* finding addtional finalizers to run,
it's unattractive to stop calling it (getting more user-defined finalizers
to run was the purpose of adding these PyGC_collect() shutdown calls to
2.3).

OTOH, any __del__ method that runs after PyImport_Cleanup() will be (AFAICT)
just as vulnerable to producing nonsense errors and segfaults as the code in
_Condition.wait() has proven to be (sys is useless by that point, and all
the Python internal code sucking basic objects out of sys isn't expecting to
get None back).

Maybe we should remove the second PyGC_collect() call before more apps run
into these mysteries.

Maybe we should delay tearing down sys as a special case (even more of a
special case than it is now).

Maybe the Zope3 tests should stop leaving an ever-growing number of daemon
threads around (which appears to be the only solution so long as they're run
under Python 2.3).




More information about the Python-Dev mailing list