[Python-Dev] Proposal: defaultdict

Fri Feb 17 08:41:37 CET 2006

On Thu, 16 Feb 2006 13:11:49 -0800, Guido van Rossum <guido at python.org> wrote:

>A bunch of Googlers were discussing the best way of doing the
>following (a common idiom when maintaining a dict of lists of values
>relating to a key, sometimes called a multimap):
>
>  if key not in d: d[key] = []
>  d[key].append(value)
>
>An alternative way to spell this uses setdefault(), but it's not very readable:
>
>  d.setdefault(key, []).append(value)
>
>and it also suffers from creating an unnecessary list instance.
>(Timings were inconclusive; the approaches are within 5-10% of each
>other in speed.)
>
>My conclusion is that setdefault() is a failure -- it was a
>well-intentioned construct, but doesn't actually create more readable
>code.
>
>Google has an internal data type called a DefaultDict which gets
>passed a default value upon construction. Its __getitem__ method,
>instead of raising KeyError, inserts a shallow copy (!) of the given
>default value into the dict when the value is not found. So the above
>code, after
>
>  d = DefaultDict([])
>
>can be written as simply
>
>  d[key].append(value)
>
Wouldn't it be more generally powerful to pass type or factory function
to use to instantiate a default object when a missing key is encountered, e.g.

   d = DefaultDict(list)

then

   d[key].append(value)

but then you can also do

   d = DefaultDict(dict)
   d[key].update(a=123)

or

   class Foo(object): pass
   d = DefaultDict(Foo)
   d[key].phone = '415-555-1212'

etc. No worries about generalizing shallow copying either ;-)

>Note that of all the possible semantics for __getitem__ that could
>have produced similar results (e.g. not inserting the default in the
>underlying dict, or not copying the default value), the chosen
>semantics are the only ones that makes this example work.
>
>Over lunch with Alex Martelli, he proposed that a subclass of dict
>with this behavior (but implemented in C) would be a good addition to
>the language. It looks like it wouldn't be hard to implement. It could
>be a builtin named defaultdict. The first, required, argument to the
>constructor should be the default value. Remaining arguments (even
>keyword args) are passed unchanged to the dict constructor.
>
>Some more design subtleties:
>
>- "key in d" still returns False if the key isn't there
>- "d.get(key)" still returns None if the key isn't there
>- "d.default" should be a read-only attribute giving the default value
>
>Feedback?
>
See above.

Regards,
Bengt Richter