Issue 10194 - Adding a gc.remap() function

I have a patch that adds a new function to the gc module. The gc.remap() function uses the tp_traverse mechanism to find all references to any keys in a provided mapping, and remaps these references in-place to instead point to the value corresponding to each key. The motivation for adding this method is to enable writing a module that provide an enhanced version of imp.reload. The builtin reload function is very useful for iterating on a single module within the Python interpreter shell, but in more complex situations it very limited. In particular, instances of classes declared in the reloaded module will continue to reference the old versions of the classes, and other modules that imported elements of the old module using the 'from ... import ...' syntax will continue to refer to the stale version of the functions or classes that they imported. The gc.remap() function enables writing a new version of reload which uses imp.reload to reload a module and then replaces all references to stale objects from the old module to instead point to equivalent newly defined objects. This still has many limitations, for instance if an __init__ function has been changed the new __init__ will not be run on old instances. On the other hand, in many cases this is sufficient to continue iterating on code without needing to restart the Python environment, which can be a significant time savings. I initially tried to implement this reloading strategy entirely in Python using gc.getreferrers() to find references to objects defined in the old module, but I found it was too difficult to reliably replace references in objects once they had been found. Since the GC already has a way to find all fields that refer to objects, it seemed fairly straightforward to extend that mechanism to additionally modify references. This reloading strategy is documented in more detail here: http://doublestar.org/in-place-python-reloading/ A potentially controversial aspect of this change is that the signature of the visitproc has been modified to take (PyObject **) as an argument instead of (PyObject *) so that a visitor can modify fields visited with Py_VISIT. A few traverse functions in the standard library also had to be changed to use Py_VISIT on the actual members rather than on aliased pointers. I also have a prototype of an enhanced reload function using gc.remap. This is only a partial implementation of the proposal, in particular it does not rehash dictionaries that have been invalidated as a result of reloading, and it does not support custom __reload__ hooks. A link to the code as well as some examples are here: http://doublestar.org/python-hot-loading-prototype/ Please let me know if you have any feedback on the reloading proposal, the hot loading prototype, or on the patch. Thanks, Peter

On 10/26/2010 07:04 AM, Peter Ingebretson wrote:
I have a patch that adds a new function to the gc module. The gc.remap() function uses the tp_traverse mechanism to find all references to any keys in a provided mapping, and remaps these references in-place to instead point to the value corresponding to each key.
What about objects that don't implement tp_traverse because they cannot take part in cycles? Changing immutable objects such as tuples and frozensets doesn't exactly sound appealing.
A potentially controversial aspect of this change is that the signature of the visitproc has been modified to take (PyObject **) as an argument instead of (PyObject *) so that a visitor can modify fields visited with Py_VISIT.
This sounds like a bad idea -- visitproc is not limited to visiting struct members. Visited objects can be stored in data structures where their address cannot be directly obtained. For example, in C++, you could have an std::map with PyObject* keys, and it wouldn't be legal to pass addresses of those. Various C++ bindings also implement smart_ptr-style wrappers over PyObject* that handle Python reference counting, and those will also have problems with visitproc taking PyObject **. And this is not just some oddball C++ thing. Many extensions wrap arbitrary C objects which can reach Python data through other C objects, which expose the PyObject* only through a generic "void *get_user_data()"-style accessor. For such objects to cooperate with the GC, they must be able to visit arbitrary PyObject pointers without knowing their address. PyGTK and pygobject are the obvious examples of this, but I'm sure there are many others. If you want to go this route, rather create an extended visit procedure (visitchangeproc?) that accepts a function that can change the reference. A convenience function or macro could implement this for the common case of struct member or PyObject**.

--- On Tue, 10/26/10, Hrvoje Niksic <hrvoje.niksic@avl.com> wrote:
What about objects that don't implement tp_traverse because they cannot take part in cycles?
A significant majority of objects that can hold references to other objects can take part in cycles and do implement tp_traverse. My original thought was that modifying any references not visible to the cyclic GC would be out of the scope of gc.remap. Even adding a 'tp_extended_traverse' method might not help solve this problem because untracked objects are not in any generation list, so there is no general way to find all of them.
Changing immutable objects such as tuples and frozensets doesn't exactly sound appealing.
My original Python-only approach cloned immutable objects that referenced objects that were to be remapped, and then added the old and new immutable object to the mapping. This worked well, although it was somewhat complicated because it had to happen in dependency order (e.g., to handle tuples of tuples in frozensets). I thought about keeping this, but I am now convinced that as long as you are doing something as drastic as changing references in the heap you may as well change immutable objects. The main argument is that preserving immutable objects increases the complexity of remapping and does not actually solve many problems. The primary reason for objects to be immutable is so that their comparison operators and hash value can remain consistent. Changing, for example, the contents of a tuple that a dictionary key references has the same effect as changing the identity of the tuple -- both modify the hash value of the key and thus invalidate the dictionary. The full reload processs needs to rehash collections invalidated by hash values changing, so we might as well modify the contents of tuples.
the signature of visitproc has been modified to take (PyObject **) instead of (PyObject *) so that a visitor can modify fields visited with Py_VISIT.
This sounds like a bad idea -- visitproc is not limited to visiting struct members. Visited objects can be stored in data structures where their address cannot be directly obtained.
If you want to go this route, rather create an extended visit procedure (visitchangeproc?) that accepts a function that can change the reference. A convenience function or macro could implement this for the common case of struct member or PyObject**.
This is a compelling argument. I considered adding an extended traverse / visit path, but decided against it after not finding any cases in the base distribution that required it. The disadvantage of creating an additional method is that C types will have yet another method to implement for the gc (tp_traverse, tp_clear, and now tp_traverse_modify(?)). On the other hand, you've convinced me that this is necessary in some cases, so it might as well be used in all of them. Jon Parise also pointed out in a private communication that this eliminates the minor performance impact on tp_traverse, which is another advantage over my change. If a 'tp_traverse_modify' function were added, many types could replace their custom tp_clear function with a generic method that makes use of (visitchangeproc), which somewhat mitigates adding another method.

On 10/26/2010 07:11 PM, Peter Ingebretson wrote:
The main argument is that preserving immutable objects increases the complexity of remapping and does not actually solve many problems. The primary reason for objects to be immutable is so that their comparison operators and hash value can remain consistent.
There are other reasons as well (thread-safety), but I guess those don't really apply to python. I guess one could defend the position that the tuple hasn't really "changed" if its elements merely get upgraded in this way, but it still feels wrong.
Changing, for example, the contents of a tuple that a dictionary key references has the same effect as changing the identity of the tuple -- both modify the hash value of the key and thus invalidate the dictionary. The full reload processs needs to rehash collections invalidated by hash values changing, so we might as well modify the contents of tuples.
Do you also rehash when tuples of upgraded objects are used as dict keys?

2010/10/26 Peter Ingebretson <pingebre@yahoo.com>:
I have a patch that adds a new function to the gc module. The gc.remap() function uses the tp_traverse mechanism to find all references to any keys in a provided mapping, and remaps these references in-place to instead point to the value corresponding to each key.
The motivation for adding this method is to enable writing a module that provide an enhanced version of imp.reload. The builtin reload function is very useful for iterating on a single module within the Python interpreter shell, but in more complex situations it very limited.
Is there any reason that you'd want to do this? ...
http://doublestar.org/python-hot-loading-prototype/
Please let me know if you have any feedback on the reloading proposal, the hot loading prototype, or on the patch.
Overall, I think this adds lots of backwards incompatible code for an obscure use-case that will cause subtle and complicated bugs. So, -1. -- Regards, Benjamin

--- On Tue, 10/26/10, Benjamin Peterson <benjamin@python.org> wrote:
Is there any reason that you'd want to do this?
I have a relatively large application written in Python, and a specific use case where it will significantly increase our speed of iteration to be able to change and test modules without needing to restart the application. We have experimented with different approaches to reloading and this one seems the most promising by a wide margin.
Overall, I think this adds lots of backwards incompatible code for an obscure use-case that will cause subtle and complicated bugs. So, -1.
Would you still object to the change if (visitproc), Py_VISIT and tp_traverse were reverted to their previous state, and a separate path was added for modifying references using (visitchangeproc), Py_VISIT_CHANGE, and tp_traverse_change? Thanks, Peter

Am 26.10.2010 19:24, schrieb Peter Ingebretson:
--- On Tue, 10/26/10, Benjamin Peterson <benjamin@python.org> wrote:
Is there any reason that you'd want to do this?
I have a relatively large application written in Python, and a specific use case where it will significantly increase our speed of iteration to be able to change and test modules without needing to restart the application. We have experimented with different approaches to reloading and this one seems the most promising by a wide margin.
I think this then mandates a PEP; I'm -1 on the feature also. In the PEP, you should explain what the alternatives are and why they don't work in the use cases of this feature. For example, I wonder why just changing the classes to have the updated methods isn't good enough. Or, if you want to be able to change the class of all instances of that class, why you can't track all instances explicitly. In the PEP, you should then also explain what the limitations of the feature should. I.e. the feature should not be specified by its implementation, but have some abstract specification. Regards, Martin

--- On Tue, 10/26/10, "Martin v. Löwis" <martin@v.loewis.de> wrote:
I think this then mandates a PEP; I'm -1 on the feature also.
I am happy to write up a PEP for this feature. I'll start that process now, though if anyone feels that this idea has no chance of acceptance please let me know. Thanks, Peter

Am 26.10.2010 22:28, schrieb Peter Ingebretson:
--- On Tue, 10/26/10, "Martin v. Löwis" <martin@v.loewis.de> wrote:
I think this then mandates a PEP; I'm -1 on the feature also.
I am happy to write up a PEP for this feature. I'll start that process now, though if anyone feels that this idea has no chance of acceptance please let me know.
If it could actually work in a reasonable way, I would be +0. If, as I think, it can't possibly work correctly, I'll be -1. In this evaluation, I compare this to Smalltalk's Object>>#become: What you propose should have a similar effect, IMO, although it's probably not necessary to provide the two-way nature of become: Regards, Martin

--- On Tue, 10/26/10, "Martin v. Löwis" <martin@v.loewis.de> wrote:
I think this then mandates a PEP; I'm -1 on the feature also.
I am happy to write up a PEP for this feature. I'll start that process now, though if anyone feels that this idea has no chance of acceptance please let me know.
If it could actually work in a reasonable way, I would be +0. If, as I think, it can't possibly work correctly, I'll be -1.
In this evaluation, I compare this to Smalltalk's Object>>#become: What you propose should have a similar effect, IMO, although it's probably not necessary to provide the two-way nature of become.
Thanks, I didn't know about Object>>#become until now but it is a perfect comparison. The two-way nature of become appears to be due to the implementation detail of swapping two entries in a table, but the current spec for gc.remap can achieve the same effect with:
gc.remap({a:b, b:a})
Of course #become and gc.remap also share the same power and danger. I'm retracting the patch in 10194 and will submit a new one later as part of the PEP that uses a parallel traverse mechanism. Still, if you are concerned that this approach cannot work I encourage you to try out the patch associated with 10194 by playing around with gc.remap in the interpreter or looking at the unit tests. I was surprised when I made the change initially by how little code was required and by how well it seemed to work in practice. Thanks, Peter

Peter Ingebretson <pingebre@yahoo.com> wrote:
I am happy to write up a PEP for this feature. I'll start that process now, though if anyone feels that this idea has no chance of acceptance please let me know.
I think a feature that allows modules to be more reliability reloaded could be accepted. Martin's suggestion sounds like it could be useful. I would recommend trying to limit the scope of the feature and clearly define what it intends to achieve (e.g. use cases). The idea of replacing references does not seem to have much hope, IMHO. It presents all kinds of subtle problems. Dictionary hashing is only one of many invariants that could be broken by blindly replacing references. You have no way of knowing what other invariants are expected or if the new objects will satisfy them. Also, there would have to be a very compelling reason to change to the signature of "visitproc". Every Python module that participates in GC would have to be modified as a result of the signature change. Regards, Neil

--- On Tue, 10/26/10, Neil Schemenauer <nas@arctrix.com> wrote:
I am happy to write up a PEP for this feature. I'll start that process now, though if anyone feels that this idea has no chance of acceptance please let me know.
I think a feature that allows modules to be more reliability reloaded could be accepted. Martin's suggestion sounds like it could be useful. I would recommend trying to limit the scope of the feature and clearly define what it intends to achieve (e.g. use cases).
The idea of replacing references does not seem to have much hope, IMHO.
I agree that the important feature is module reloading, whether it is implemented via remapping references or by replacing the state of existing objects is an implementation detail. I will try to keep the scope of the PEP focused, and if necessary I will split it up into two. Thanks, Peter

On 08:28 pm, pingebre@yahoo.com wrote:
--- On Tue, 10/26/10, "Martin v. L�wis" <martin@v.loewis.de> wrote:
I think this then mandates a PEP; I'm -1 on the feature also.
I am happy to write up a PEP for this feature. I'll start that process now, though if anyone feels that this idea has no chance of acceptance please let me know.
This can be implemented with ctypes right now (I half did it several years ago). Jean-Paul

--- On Tue, 10/26/10, P.J. Eby <pje@telecommunity.com> wrote:
If all you really want this for is reloading, it would probably make more sense to simply modify the existing class and function objects using the reloaded values as a template, then save the modified classes and functions back to the module.
Have you tried http://pypi.python.org/pypi/plone.reload or http://svn.python.org/projects/sandbox/trunk/xreload/xreload.py, or any other existing code reloaders, or tried extending them for your specific use case?
I've investigated several reloading frameworks, including the ones you mentions as well as http://code.google.com/p/reimport/ and http://code.google.com/p/livecoding/. The approach of using the gc to remap references seemed to have the fewest overall limitations, but requiring C API changes is a big downside. I'm going to have to do a more detailed comparison of the features offered by each approach. --- On Tue, 10/26/10, exarkun@twistedmatrix.com <exarkun@twistedmatrix.com> wrote:
This can be implemented with ctypes right now (I half did it several years ago).
Jean-Paul
Is there a trick to doing it this way, or are you suggesting building a ctypes wrapper for each C type in the Python library, and then effectively reimplementing tp_traverse in Python?

On 26 Oct, 11:31 pm, pingebre@yahoo.com wrote:
--- On Tue, 10/26/10, exarkun@twistedmatrix.com <exarkun@twistedmatrix.com> wrote:
This can be implemented with ctypes right now (I half did it several years ago).
Jean-Paul
Is there a trick to doing it this way, or are you suggesting building a ctypes wrapper for each C type in the Python library, and then effectively reimplementing tp_traverse in Python?
That's the idea, yes. Jean-Paul
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python- dev/exarkun%40twistedmatrix.com

At 10:24 AM 10/26/2010 -0700, Peter Ingebretson wrote:
I have a relatively large application written in Python, and a specific use case where it will significantly increase our speed of iteration to be able to change and test modules without needing to restart the application.
If all you really want this for is reloading, it would probably make more sense to simply modify the existing class and function objects using the reloaded values as a template, then save the modified classes and functions back to the module. Have you tried http://pypi.python.org/pypi/plone.reload or http://svn.python.org/projects/sandbox/trunk/xreload/xreload.py, or any other existing code reloaders, or tried extending them for your specific use case?
participants (7)
-
"Martin v. Löwis"
-
Benjamin Peterson
-
exarkun@twistedmatrix.com
-
Hrvoje Niksic
-
Neil Schemenauer
-
P.J. Eby
-
Peter Ingebretson