[Python-ideas] weakref.WeakKeyDictionary is (basically) useless

Kevin Norris nykevin.norris at gmail.com
Wed Dec 31 01:22:36 CET 2014


Let's talk about weakref.WeakKeyDictionary.

First, what is it?  It's a dictionary whose keys are referenced
weakly.  That is, the dictionary takes weak references to its keys.
If a key is garbage collected, it magically vanishes from the
dictionary.  This saves programmers much of the trouble of manually
culling dead weak references from data structures.

Why would you use it?  The official documentation mentions
implementing a [descriptor][1].  See [this paste][2] for a simple
example of this sort of thing.  Turning the WeakKeyDictionary into a
regular dictionary would be troublesome.  Since the descriptor object
will (probably) never die, if it used a normal dictionary, it would
keep all/most/some/more-than-one instances of Bar() alive forever.
While this simple example would be better implemented as a property,
or even a bare attribute, it is nice if you have many separate Bar
classes and complicated per-property processing.

Why is it broken?  To go back to our simple example, imagine someone
subclasses Bar (let's call the subclass Baz) and implements __eq__()
and/or __hash__().  Instances of Baz may not work at all (if __hash__
is None) or they may "work" but behave nonsensically (all values-equal
instances of Baz will have the same .foo attribute - imagine trying to
track that bug down!).  While we could forbid such a subclass in our
API docs, this runs afoul of the open/closed principle.

What can we do about it?  It's possible someone is relying on this
behavior, so we probably should not change the existing class.
Instead, we can offer a new class called, say, WeakKeyIDDictionary.
Said class would use object identity instead of object
equality/hashing to uniquify keys.  It would also reference keys
weakly, just like WeakKeyDictionary does now.

Implementing such a class is somewhat tricky, since weak references
are difficult to reason about correctly.  I've written up a
[simplistic prototype][3], but I'm concerned about both performance
and correctness.  Specifically, I know that I need to call
_remove_dead_refs() periodically, but I don't see where I can do that
without compromising performance (e.g. making key lookup O(n) is a Bad
Idea).  I also would note it looks nothing like the stdlib
implementations of WeakKey/ValueDictionary, which finagle raw weakrefs
directly.  These problems are related, of course, since the stdlib
uses finalizer callbacks to cull dead references as they die.
Frankly, the stdlib's code looks incredibly hairy to me and I doubt I
could correctly re-implement it.  Finally, I've no idea if it's
subject to race conditions or other issues (it is a completely
untested example I threw together in 5 minutes).

Because this class is simultaneously useful and hard to write
correctly, it would be a nice addition to the standard library
(probably not in the form I've written it!), or failing that, some
other library.  I've Googled for such a library and come up empty.

Should I submit a bug, a PEP, or is this whole idea just stupid?

[1]: https://docs.python.org/3/reference/datamodel.html#implementing-descriptors
[2]: http://pastebin.com/90j2ZAqC
[3]: http://pastebin.com/cpZh2bTa

--
Kevin Norris


More information about the Python-ideas mailing list