[Python-Dev] Issue 10194 - Adding a gc.remap() function
Peter Ingebretson
pingebre at yahoo.com
Tue Oct 26 19:11:00 CEST 2010
--- On Tue, 10/26/10, Hrvoje Niksic <hrvoje.niksic at avl.com> wrote:
> What about objects that don't implement tp_traverse because
> they cannot take part in cycles?
A significant majority of objects that can hold references to other
objects can take part in cycles and do implement tp_traverse. My
original thought was that modifying any references not visible to
the cyclic GC would be out of the scope of gc.remap.
Even adding a 'tp_extended_traverse' method might not help solve
this problem because untracked objects are not in any generation list,
so there is no general way to find all of them.
> Changing immutable objects such as tuples and frozensets
> doesn't exactly sound appealing.
My original Python-only approach cloned immutable objects that
referenced objects that were to be remapped, and then added the
old and new immutable object to the mapping. This worked well,
although it was somewhat complicated because it had to happen in
dependency order (e.g., to handle tuples of tuples in frozensets).
I thought about keeping this, but I am now convinced that as long
as you are doing something as drastic as changing references in the
heap you may as well change immutable objects.
The main argument is that preserving immutable objects increases the
complexity of remapping and does not actually solve many problems.
The primary reason for objects to be immutable is so that their
comparison operators and hash value can remain consistent. Changing,
for example, the contents of a tuple that a dictionary key references
has the same effect as changing the identity of the tuple -- both
modify the hash value of the key and thus invalidate the dictionary.
The full reload processs needs to rehash collections invalidated by
hash values changing, so we might as well modify the contents of tuples.
> > the signature of visitproc has been modified to take (PyObject **)
> > instead of (PyObject *) so that a visitor can modify fields
> > visited with Py_VISIT.
>
> This sounds like a bad idea -- visitproc is not limited to
> visiting struct members. Visited objects can be stored
> in data structures where their address cannot be directly
> obtained.
>
> If you want to go this route, rather create an extended
> visit procedure (visitchangeproc?) that accepts a function
> that can change the reference. A convenience function
> or macro could implement this for the common case of struct
> member or PyObject**.
This is a compelling argument. I considered adding an extended
traverse / visit path, but decided against it after not finding
any cases in the base distribution that required it. The
disadvantage of creating an additional method is that C types will
have yet another method to implement for the gc (tp_traverse,
tp_clear, and now tp_traverse_modify(?)). On the other hand, you've
convinced me that this is necessary in some cases, so it might as
well be used in all of them. Jon Parise also pointed out in a
private communication that this eliminates the minor performance
impact on tp_traverse, which is another advantage over my change.
If a 'tp_traverse_modify' function were added, many types could
replace their custom tp_clear function with a generic method
that makes use of (visitchangeproc), which somewhat mitigates adding
another method.
More information about the Python-Dev
mailing list