Better module shutdown procedure

The current shutdown code in pythonrun.c zaps module globals by setting them to None (an attempt to break reference cycles). That causes problems since __del__ methods can try to use the globals after they have been set to None. The procedure implemented by http://bugs.python.org/issue812369 seems to be a better idea. References to modules are replaced by weak references and the GC is allowed to cleanup reference cycles. I would like to commit this change but I'm not sure if it is a good time in the development cycle or if anyone has objections to the idea. Please speak up if you have input. Neil

On 08:16 pm, nas@arctrix.com wrote:
The current shutdown code in pythonrun.c zaps module globals by setting them to None (an attempt to break reference cycles). That causes problems since __del__ methods can try to use the globals after they have been set to None.
The procedure implemented by http://bugs.python.org/issue812369 seems to be a better idea. References to modules are replaced by weak references and the GC is allowed to cleanup reference cycles.
I would like to commit this change but I'm not sure if it is a good time in the development cycle or if anyone has objections to the idea. Please speak up if you have input.
I notice that the patch doesn't include any unit tests for the feature being provided (it does change test_sys.py - although I don't understand how that change is related to this feature). I hope that regardless of whatever else is decided, if the change is to be made, it will be accompanied by new unit tests verify its proper operation. Jean-Paul

On Wed, Oct 14, 2009 at 08:35:28PM -0000, exarkun@twistedmatrix.com wrote:
I notice that the patch doesn't include any unit tests for the feature being provided
That's hard to do although I suppose not impossible. We would be trying to test the interpreter shutdown procedure. I suppose on plaforms that support it we could fork a subprocess and then inspect the output from __del__ methods.
(it does change test_sys.py - although I don't understand how that change is related to this feature).
That's a spurious change and will be left out of the real fix. Neil

On Wed, Oct 14, 2009 at 2:45 PM, Neil Schemenauer <nas@arctrix.com> wrote:
On Wed, Oct 14, 2009 at 08:35:28PM -0000, exarkun@twistedmatrix.com wrote:
I notice that the patch doesn't include any unit tests for the feature being provided
That's hard to do although I suppose not impossible. We would be trying to test the interpreter shutdown procedure. I suppose on plaforms that support it we could fork a subprocess and then inspect the output from __del__ methods.
Why not just expose the module-teardown procedure and call it on a module, then inspect the state there as it's being torn down?

Neil Schemenauer <nas <at> arctrix.com> writes:
I would like to commit this change but I'm not sure if it is a good time in the development cycle or if anyone has objections to the idea. Please speak up if you have input.
Have you tried to remove the various hacks/workarounds in the stdlib that currently avoid module finalization problems? (take a look at e.g. threading.py) It would be a good validation that the patch improves on the current behaviour. Regards Antoine.

Antoine Pitrou <solipsis@pitrou.net> wrote:
Have you tried to remove the various hacks/workarounds in the stdlib that currently avoid module finalization problems?
Good idea. These hacks are the best argument for applying the change, IMHO. I made a quick review of Lib modules using __del__: tarfile: looks not safe tempfile: goes through elaborate motions to avoid globals threading: yikes, I didn't attempt to decypher this one shelve: probably not safe dumbdb: avoids globals, easy to remove hacks _pyio: try/except can be removed (hopefully) urllib: avoids globals, easy to remove hacks Will try to cook up a patch in my copious free time. Neil

On Wed, Oct 14, 2009 at 3:16 PM, Neil Schemenauer <nas@arctrix.com> wrote:
The current shutdown code in pythonrun.c zaps module globals by setting them to None (an attempt to break reference cycles). That causes problems since __del__ methods can try to use the globals after they have been set to None.
The procedure implemented by http://bugs.python.org/issue812369 seems to be a better idea. References to modules are replaced by weak references and the GC is allowed to cleanup reference cycles.
Based on the description, it still resorts to zapping module globals by setting them to None. It zaps them to weakrefs first, which means that globals are more likely to be valid during __del__, but it still cannot make any guarantees and referencing globals from __del__ is still a bad idea. Is that a correct synopsis? -- Daniel Stutzbach, Ph.D. President, Stutzbach Enterprises, LLC <http://stutzbachenterprises.com>

On Wed, Oct 14, 2009 at 04:13:12PM -0500, Daniel Stutzbach wrote:
Based on the description, it still resorts to zapping module globals by setting them to None. It zaps them to weakrefs first, which means that globals are more likely to be valid during __del__, but it still cannot make any guarantees and referencing globals from __del__ is still a bad idea. Is that a correct synopsis?
Yes, it does still resort to setting globals to None. However, the weakref step makes it much more likely that __del__ methods run before that happens. After this change, referencing global variables from __del__ methods is okay. What is not a good idea is creating __del__ methods that are part of a reference cycle (i.e. an island of references) and also refer to that cycle somehow. That has never been a good idea and those __del__ methods will never get run, before or after the proposed change. HTH, Neil

On Wed, Oct 14, 2009 at 4:42 PM, Neil Schemenauer <nas@arctrix.com> wrote:
Yes, it does still resort to setting globals to None. However, the weakref step makes it much more likely that __del__ methods run before that happens. After this change, referencing global variables from __del__ methods is okay.
The first and last sentences seem like a contradiction to me. If I cannot guarantee that globals will be valid when __del__ is executed, then I must not reference globals from __del__. I think I'm missing something here. Is there some way the programmer can determine that referencing a global variable in __del__ will be 100% safe? (not just "likely") -- Daniel Stutzbach, Ph.D. President, Stutzbach Enterprises, LLC <http://stutzbachenterprises.com>

On Wed, Oct 14, 2009 at 05:35:38PM -0500, Daniel Stutzbach wrote:
The first and last sentences seem like a contradiction to me. If I cannot guarantee that globals will be valid when __del__ is executed, then I must not reference globals from __del__.
I think there is confusion about the word "reference". In the method: def __del__(): print sys.version the global variable reference to 'sys' is not a reference on the GC referencing counting sense. IOW, it does not result in a a Py_INCREF while the function is not being executed and therefore should be safe after the proposed change. Currently, it could result in 'None' being printed. HTH, Neil

On Wed, Oct 14, 2009 at 6:05 PM, Neil Schemenauer <nas@arctrix.com> wrote:
def __del__(): print sys.version
the global variable reference to 'sys' is not a reference on the GC referencing counting sense. IOW, it does not result in a a Py_INCREF while the function is not being executed and therefore should be safe after the proposed change. Currently, it could result in 'None' being printed.
Currently it throws an exception since "sys" is None. :-) Here is my understanding of the proposed procedure: 1. Replace modules in sys.modules with weakrefs 2. Run the garbage collector 3. Replace globals in any remaining modules with None 4. Run the garbage collector Is it possible for a __del__ method to be called in step 4 or not? I am still unclear on this point. :-) -- Daniel Stutzbach, Ph.D. President, Stutzbach Enterprises, LLC <http://stutzbachenterprises.com>

Le Wed, 14 Oct 2009 18:27:37 -0500, Daniel Stutzbach a écrit :
Here is my understanding of the proposed procedure:
1. Replace modules in sys.modules with weakrefs 2. Run the garbage collector 3. Replace globals in any remaining modules with None 4. Run the garbage collector
Is it possible for a __del__ method to be called in step 4 or not? I am still unclear on this point. :-)
If an object is holding a reference to a module (e.g. self._sys = sys), then yes it should be possible because the reference has been creating a cycle.

On Wed, Oct 14, 2009 at 02:16:35PM -0600, Neil Schemenauer wrote:
The procedure implemented by http://bugs.python.org/issue812369 seems to be a better idea.
After some experimentation I realize this idea is not ready yet. The main problem comes from references to Python objects that modules keep but don't expose to the garbage collector. For example, gcmodule.c has a static pointer "tmod" that is a reference to the "time" module. This reference prevents the "time" module from being freed during interpreter shutdown. Ideally, I suppose modules should be treated like any other object and have tp_traverse and tp_clear methods that deal with these sorts of pointers. They would have to delegated to the instance since each module would have its own implementation. Neil

2009/10/16 Neil Schemenauer <nas@arctrix.com>:
After some experimentation I realize this idea is not ready yet. The main problem comes from references to Python objects that modules keep but don't expose to the garbage collector. For example, gcmodule.c has a static pointer "tmod" that is a reference to the "time" module. This reference prevents the "time" module from being freed during interpreter shutdown.
Ideally, I suppose modules should be treated like any other object and have tp_traverse and tp_clear methods that deal with these sorts of pointers. They would have to delegated to the instance since each module would have its own implementation.
Note since python 3.0 (and PEP 3121), the PyModuleDef structure has some members like m_traverse, m_clear and m_free for this very purpose. So far, nobody cared to implement these methods for any module. Maybe one should start at least for static PyObject* that contain references to modules. -- Amaury Forgeot d'Arc

2009/10/16 Amaury Forgeot d'Arc <amauryfa@gmail.com>:
2009/10/16 Neil Schemenauer <nas@arctrix.com>:
After some experimentation I realize this idea is not ready yet. The main problem comes from references to Python objects that modules keep but don't expose to the garbage collector. For example, gcmodule.c has a static pointer "tmod" that is a reference to the "time" module. This reference prevents the "time" module from being freed during interpreter shutdown.
Ideally, I suppose modules should be treated like any other object and have tp_traverse and tp_clear methods that deal with these sorts of pointers. They would have to delegated to the instance since each module would have its own implementation.
Note since python 3.0 (and PEP 3121), the PyModuleDef structure has some members like m_traverse, m_clear and m_free for this very purpose. So far, nobody cared to implement these methods for any module. Maybe one should start at least for static PyObject* that contain references to modules.
I believe the implementation is buggy because modules (and their states) can easily be finalized before the objects contained in them. For example, when I tried to convert the _io module to use a state, it resulted in segfaults when the dealloc methods of objects tried to use objects in a state, which had already been deallocated. -- Regards, Benjamin
participants (7)
-
Amaury Forgeot d'Arc
-
Antoine Pitrou
-
Benjamin Peterson
-
Daniel Stutzbach
-
exarkun@twistedmatrix.com
-
Glyph Lefkowitz
-
Neil Schemenauer