[Python-Dev] Is there any remaining reason why weakref callbacks shouldn't be able to access the referenced object?

Sat Oct 22 02:05:56 EDT 2016

On Fri, Oct 21, 2016 at 8:32 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
> On 21 October 2016 at 17:09, Nathaniel Smith <njs at pobox.com> wrote:
>> But that was 2.4. In the mean time, of course, PEP 442 fixed it so
>> that finalizers and weakrefs mix just fine. In fact, weakref callbacks
>> are now run *before* __del__ methods [2], so clearly it's now okay for
>> arbitrary code to touch the objects during that phase of the GC -- at
>> least in principle.
>>
>> So what I'm wondering is, would anything terrible happen if we started
>> passing still-live weakrefs into weakref callbacks, and then clearing
>> them afterwards?
>
> The weakref-before-__del__ ordering change in
> https://www.python.org/dev/peps/pep-0442/#disposal-of-cyclic-isolates
> only applies to cyclic garbage collection,so for normal refcount
> driven object cleanup in CPython, the __del__ still happens first:
>
>     >>> class C:
>     ...     def __del__(self):
>     ...         print("__del__ called")
>     ...
>     >>> c = C()
>     >>> import weakref
>     >>> def cb():
>     ...     print("weakref callback called")
>     ...
>     >>> weakref.finalize(c, cb)
>     <finalize object at 0x7f4300b710a0; for 'C' at 0x7f42f8ae3470>
>     >>> del c
>     __del__ called
>     weakref callback called

Ah, interesting! And in the old days this was of course the right way
to do it, because until __del__ has completed it's possible that the
object will get resurrected, and you don't want to clear the weakref
until you're certain that it's dead.

But PEP 442 already broke all that :-). Now weakref callbacks can
happen before __del__, and they can happen on objects that are about
to be resurrected. So if we wanted to pursue this then it seems like
it would make sense to standardize on the following sequence for
object teardown:

0) object becomes collectible (either refcount == 0 or it's part of a
cyclic isolate)
1) weakref callbacks fire
2) weakrefs are cleared (unconditionally, so we keep the rule that any
given weakref fires at most once, even if the object is resurrected)
3) if _PyGC_REFS_MASK_FINALIZED isn't set, __del__ fires, and then
_PyGC_REFS_MASK_FINALIZED is set
4) check for resurrection
5) deallocate the object

On further thought, this does still introduce one new edge case, which
is that even if we keep the guarantee that no individual weakref can
fire more than once, it's possible for *new* weakrefs to be registered
after resurrection, so it becomes possible for an object to be
resurrected multiple times. (Currently, resurrection can only happen
once, because __del__ is disabled on resurrected objects and weakrefs
can't resurrect at all.) I'm not actually sure that this is even a
problem, but in any case it's easy to fix by making a rule that you
can't take a weakref to an object whose _PyGC_REFS_MASK_FINALIZED flag
is already set, plus adjust the teardown sequence to be:

0) object becomes collectible (either refcount == 0 or it's part of a
cyclic isolate)
1) if _PyGC_REFS_MASK_FINALIZED is set, then go to step 7. Otherwise:
2) set _PyGC_REFS_MASK_FINALIZED
3) weakref callbacks fire
4) weakrefs are cleared (unconditionally)
5) __del__ fires
6) check for resurrection
7) deallocate the object

There remains one obscure corner case where multiple resurrection is
possible, because the resurrection-prevention flag doesn't exist on
non-GC objects, so you'd still be able to take new weakrefs to those.
But in that case __del__ can already do multiple resurrections, and
some fellow named Nick Coghlan seemed to think that was okay back in
2013 [1], so probably it's not too bad ;-).

[1] https://mail.python.org/pipermail/python-dev/2013-June/126850.html

> This means the main problem with a strong reference being reachable
> from the weakref callback object remains: if the callback itself is
> reachable, then the original object is reachable, and you don't have a
> collectible cycle anymore.
>
>     >>> c = C()
>     >>> def cb2(obj):
>     ...     print("weakref callback called with object reference")
>     ...
>     >>> weakref.finalize(c, cb2, c)
>     <finalize object at 0x7f4300b710b0; for 'C' at 0x7f42f8ae3470>
>     >>> del c
>     >>>
>
> Changing that to support resurrecting the object so it can be passed
> into the callback without the callback itself holding a strong
> reference means losing the main "reasoning about software" benefit
> that weakref callbacks offer: they currently can't resurrect the
> object they relate to (since they never receive a strong reference to
> it), so it nominally doesn't matter if the interpreter calls them
> before or after that object has been entirely cleaned up.

I guess I'm missing the importance of this -- does the interpreter
gain some particular benefit from having flexibility about when to
fire weakref callbacks? Obviously it has to pick one in practice.

(The async use case that got me thinking about this is, of course,
exactly one where we would want a weakref callback to resurrect the
object it refers to. Only once, though.)

-n

-- 
Nathaniel J. Smith -- https://vorpus.org