[Python-Dev] Proposal: defaultdict

Steve Holden steve at holdenweb.com
Sun Feb 19 04:44:37 CET 2006

Guido van Rossum wrote:
> On 2/16/06, Guido van Rossum <guido at python.org> wrote:
>>Over lunch with Alex Martelli, he proposed that a subclass of dict
>>with this behavior (but implemented in C) would be a good addition to
>>the language. It looks like it wouldn't be hard to implement. It could
>>be a builtin named defaultdict. The first, required, argument to the
>>constructor should be the default value. Remaining arguments (even
>>keyword args) are passed unchanged to the dict constructor.
> Thanks for all the constructive feedback. Here are some responses and
> a new proposal.
> - Yes, I'd like to kill setdefault() in 3.0 if not sooner.
> - It would indeed be nice if this was an optional feature of the
> standard dict type.
> - I'm ignoring the request for other features (ordering, key
> transforms). If you want one of these, write a PEP!
> - Many, many people suggested to use a factory function instead of a
> default value. This is indeed a much better idea (although slightly
> more cumbersome for the simplest cases).
One might think about calling it if it were callable, otherwise using it 
literally. Of course this would require jiggery-pokery int eh cases 
where you actually *wantes* the default value to be a callable (you'd 
have to provide a callable to return the callable as a default).

> - Some people seem to think that a subclass constructor signature must
> match the base class constructor signature. That's not so. The
> subclass constructor must just be careful to call the base class
> constructor with the correct arguments. Think of the subclass
> constructor as a factory function.
True, but then this does get in the way of treating the base dict and 
its defaulting subtype polymorphically. That might not be a big issue.

> - There's a fundamental difference between associating the default
> value with the dict object, and associating it with the call. So
> proposals to invent a better name/signature for setdefault() don't
> compete. (As to one specific such proposal, adding an optional bool as
> the 3rd argument to get(), I believe I've explained enough times in
> the past that flag-like arguments that always get a constant passed in
> at the call site are a bad idea and should usually be refactored into
> two separate methods.)
> - The inconsistency introduced by __getitem__() returning a value for
> keys while get(), __contains__(), and keys() etc. don't show it,
> cannot be resolved usefully. You'll just have to live with it.
> Modifying get() to do the same thing as __getitem__() doesn't seem
> useful -- it just takes away a potentially useful operation.
> So here's a new proposal.
> Let's add a generic missing-key handling method to the dict class, as
> well as a default_factory slot initialized to None. The implementation
> is like this (but in C):
> def on_missing(self, key):
>   if self.default_factory is not None:
>     value = self.default_factory()
>     self[key] = value
>     return value
>   raise KeyError(key)
> When __getitem__() (and *only* __getitem__()) finds that the requested
> key is not present in the dict, it calls self.on_missing(key) and
> returns whatever it returns -- or raises whatever it raises.
> __getitem__() doesn't need to raise KeyError any more, that's done by
> on_missing().
> The on_missing() method can be overridden to implement any semantics
> you want when the key isn't found: return a value without inserting
> it, insert a value without copying it, only do it for certain key
> types/values, make the default incorporate the key, etc.
> But the default implementation is designed so that we can write
> d = {}
> d.default_factory = list
> to create a dict that inserts a new list whenever a key is not found
> in __getitem__(), which is most useful in the original use case:
> implementing a multiset so that one can write
> d[key].append(value)
> to add a new key/value to the multiset without having to handle the
> case separately where the key isn't in the dict yet. This also works
> for sets instead of lists:
> d = {}
> d.default_factory = set
> ...
> d[key].add(value)
This seems like a very good compromise.

[non-functional alternatives ...]
Steve Holden       +44 150 684 7255  +1 800 494 3119
Holden Web LLC                     www.holdenweb.com
PyCon TX 2006                  www.python.org/pycon/

More information about the Python-Dev mailing list