[Python-Dev] defaultdict and on_missing()

Wed Feb 22 12:45:47 CET 2006

I'm concerned that the on_missing() part of the proposal is gratuitous.  The main use cases for defaultdict have a simple factory that supplies a zero, empty list, or empty set.  The on_missing() hook is only there to support the rarer case of needing a key to compute a default value.  The hook is not needed for the main use cases.

As it stands, we're adding a method to regular dicts that cannot be usefully called directly.  Essentially, it is a framework method meant to be overridden in a subclass.  So, it only makes sense in the context of subclassing.  In the meantime, we've added an oddball method to the main dict API, arguably the most important object API in Python.  

To use the hook, you write something like this:

    class D(dict):
        def on_missing(self, key):
             return somefunc(key)

However, we can already do something like that without the hook:

    class D(dict):
        def __getitem__(self, key):
            try:
                return dict.__getitem__(self, key)
            except KeyError:
                self[key] = value = somefunc(key)
                return value

The latter form is already possible, doesn't require modifying a basic API, and is arguably clearer about when it is called and what it does (the former doesn't explicitly show that the returned value gets saved in the dictionary).

Since we can already do the latter form, we can get some insight into whether the need has ever actually arisen in real code.  I scanned the usual sources (my own code, the standard library, and my most commonly used third-party libraries) and found no instances of code like that.   The closest approximation was safe_substitute() in string.Template where missing keys returned themselves as a default value.  Other than that, I conclude that there isn't sufficient need to warrant adding a funky method to the API for regular dicts.

I wondered why the safe_substitute() example was unique.  I think the answer is that we normally handle default computations through simple in-line code ("if k in d: do1() else do2()" or a try/except pair).  Overriding on_missing() then is really only useful when you need to create a type that can be passed to a client function that was expecting a regular dictionary.  So it does come-up but not much.

Aside:  Why on_missing() is an oddball among dict methods.  When teaching dicts to beginner, all the methods are easily explainable except this one.  You don't call this method directly, you only use it when subclassing, you have to override it to do anything useful, it hooks KeyError but only when raised by __getitem__ and not other methods, etc.  I'm concerned that evening having this method in regular dict API will create confusion about when to use dict.get(), when to use dict.setdefault(), when to catch a KeyError, or when to LBYL.  Adding this one extra choice makes the choice more difficult.

My recommendation:  Dump the on_missing() hook.  That leaves the dict API unmolested and allows a more straight-forward implementation/explanation of collections.default_dict or whatever it ends-up being named.  The result is delightfully simple and easy to understand/explain.

Raymond

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-dev/attachments/20060222/1d64a51d/attachment.htm