[Python-Dev] Quick-and-dirty weak references

Tim Peters tim_one@email.msn.com
Mon, 16 Aug 1999 02:42:17 -0400


[Jack Jansen]
> ...

A long time ago, Dianne Hackborn actually implemented a scheme like this,
under the name VREF (for "virtual reference", or some such).  IIRC,
differences from your scheme were mainly that:

1) There was an elaborate proxy mechanism to avoid having to explicitly
strengthen the weak.

2) Each object contained a pointer to a linked list of associated weak refs.

This predates DejaNews, so may be a pain to find.

> ...
> We add a new builtin function (or a module with that function)
> weak(). This returns a weak reference to the object passed as a
> parameter. A weak object has one method: strong(), which returns the
> corresponding real object or raises an exception if the object doesn't
> exist anymore.

This interface appears nearly isomorphic to MIT Scheme's "hash" and "unhash"
functions, except that their hash returns an (unbounded) int and guarantees
that hash(o1) != hash(o2) for any distinct objects o1 and o2 (this is a
stronger guarantee than Python's "id", which may return the same int for
objects with disjoint lifetimes; the other reason object address isn't
appropriate for them is that objects can be moved by garbage collection, but
hash is an object invariant).

Of course unhash(hash(o)) is o, unless o has been gc'ed; then unhash raises
an exception.  By most accounts (I haven't used it seriously myself), it's a
usable interface.

> ...
> to implement this I need to add a pointer to every object.

That's unattractive, of course.

> ...
> (actually: we could make do with a single bit in every object, with
> the bit meaning "this object has an associated weak object". We could
> then use a global dictionary indexed by object address to find the
> weak object)

Is a single bit actually smaller than a pointer?  For example, on most
machines these days

#define PyObject_HEAD \
	int ob_refcnt; \
	struct _typeobject *ob_type;

is two 4-byte fields packed solid already, and structure padding prevents
adding anything less than a 4-byte increment in reality.  I guess on Alpha
there's a 4-byte hole here, but I don't want weak pointers enough to switch
machines <wink>.

OTOH, sooner or later Guido is going to want a mark bit too, so the other
way to view this is that 32 new flag bits are as cheap as one <wink>.

There's one other thing I like about this:  it can get rid of the dicey

> Strong() checks that self->object->weak == self and returns
> self->object (INCREFfed) if it is.

check.  If object has gone away, you're worried that self->object may (on
some systems) point to a newly-invalid address.  But worse than that, its
memory may get reused, and then self->object may point into the *middle* of
some other object where the bit pattern at the "weak" offset just happens to
equal self.

Let's try a sketch in pseduo-Python, where __xxx are secret functions that
do the obvious things (and glossing over thread safety since these are
presumably really implemented in C):

# invariant:  __is_weak_bit_set(obj) == id2weak.has_key(id(obj))
# So "the weak bit" is simply an optimization, sparing most objects
# from a dict lookup when they die.
# The invariant is delicate in the presence of threads.

id2weak = {}

class _Weak:
    def __init__(self, obj):
        self.id = id(obj)  # obj's refcount not bumped
        __set_weak_bit(obj)
        id2weak[self.id] = self
        # note that "the system" (see below) sets self.id
        # to None if obj dies

    def strong(self):
        if self.id is None:
            raise DeadManWalkingError(self.id)
        return __id2obj(self.id)  # will bump obj's refcount

    def __del__(self):
        # this is purely an optimization:  if self gets nuked,
        # exempt its referent from greater expense when *it*
        # dies
        if self.id is not None:
            __clear_weak_bit(__id2obj(self.id))
            del id2weak[self.id]

def weak(obj):
    return id2weak.get(id(obj), None) or _Weak(obj)

and then whenever an object of any kind is deleted the system does:

    if __is_weak_bit_set(obj):
        objid = id(obj)
        id2weak[objid].id = None
        del id2weak[objid]

In my current over-tired state, I think that's safe (modulo threads),
portable and reasonably fast; I do think the extra bit costs 4 bytes,
though.

> ...
> The weak object isn't transparent, because you have to call strong()
> before you can do anything with it, but this is an advantage (says he,
> aspiring to a career in politics or sales:-): with a transparent weak
> object the object could disappear at unexpected moments and with this
> scheme it can't, because when you have the object itself in hand you
> have a refcount too.

Explicit is better than implicit for me.

[M.-A. Lemburg]
> Have you checked the weak reference dictionary implementation
> by Dieter Maurer ? It's at:
>
>	http://www.handshake.de/~dieter/weakdict.html

A project where I work is using it; it blows up a lot <wink/frown>.

While some form of weak dict is what most people want in the end, I'm not
sure Dieter's decision to support weak dicts with only weak values (not weak
keys) is sufficient.  For example, the aforementioned project wants to
associate various computed long strings with certain hashable objects, and
for some reason or other (ain't my project ...) these objects can't be
changed.  So they can't store the strings in the objects.  So they'd like to
map the objects to the strings via assorted dicts.  But using the object as
a dict key keeps it (and, via the dicts, also its associated strings)
artificially alive; they really want a weakdict with weak *keys*.

I'm not sure I know of a clear & fast way to implement a weakdict building
only on the weak() function.  Jack?

Using weak objects as values (or keys) with an ordinary dict can prevent
their referents from being kept artificially alive, but that doesn't get the
dict itself cleaned up by magic.  Perhaps "the system" should notify a weak
object when its referent goes away; that would at least give the WO a chance
to purge itself from structures it knows it's in ...

> ...
> BTW, how would this be done in JPython ? I guess it doesn't
> make much sense there because cycles are no problem for the
> Java VM GC.

Weak refs have many uses beyond avoiding cycles, and Java 1.2 has all of
"hard", "soft", "weak", and "phantom" references.  See java.lang.ref for
details.  I stopped paying attention to Java, so it's up to you to tell us
what you learn about it <wink>.