Why defaultdict?

Thomas Jollans thomas at jollans.com
Fri Jul 2 05:20:48 EDT 2010


On 07/02/2010 06:11 AM, Steven D'Aprano wrote:
> I would like to better understand some of the design choices made in 
> collections.defaultdict.
> 
> Firstly, to initialise a defaultdict, you do this:
> 
> from collections import defaultdict
> d = defaultdict(callable, *args)
> 
> which sets an attribute of d "default_factory" which is called on key 
> lookups when the key is missing. If callable is None, defaultdicts are 
> *exactly* equivalent to built-in dicts, so I wonder why the API wasn't 
> added on to dict rather than a separate class that needed to be imported. 
> That is:
> 
> d = dict(*args)
> d.default_factory = callable

That's just not what dicts, a very simple and elementary data type, do.
I know this isn't really a good reason. In addition to what Chris said,
I expect this would complicate the dict code a great deal.

> 
> If you failed to explicitly set the dict's default_factory, it would 
> behave precisely as dicts do now. So why create a new class that needs to 
> be imported, rather than just add the functionality to dict?
> 
> Is it just an aesthetic choice to support passing the factory function as 
> the first argument? I would think that the advantage of having it built-
> in would far outweigh the cost of an explicit attribute assignment.
> 

The cost of this feature would be over-complication of the built-in dict
type when a subclass would do just as well

> 
> 
> Second, why is the factory function not called with key? There are three 
> obvious kinds of "default values" a dict might want, in order of more-to-
> less general:
> 
> (1) The default value depends on the key in some way: return factory(key)

I agree, this is a strange choice. However, nothing's stopping you from
being a bit verbose about what you want and just doing it:

class mydict(defaultdict):
    def __missing__(self, key):
        # ...

the __missing__ method is really the more useful bit the defaultdict
class adds, by the looks of it.

-- Thomas

> (2) The default value doesn't depend on the key: return factory()
> (3) The default value is a constant: return C
> 
> defaultdict supports (2) and (3):
> 
> defaultdict(factory, *args)
> defaultdict(lambda: C, *args)
> 
> but it doesn't support (1). If key were passed to the factory function, 
> it would be easy to support all three use-cases, at the cost of a 
> slightly more complex factory function. E.g. the current idiom:
> 
> defaultdict(factory, *args)
> 
> would become:
> 
> defaultdict(lambda key: factory(), *args)
> 
> 
> (There is a zeroth case as well, where the default value depends on the 
> key and what else is in the dict: factory(d, key). But I suspect that's 
> well and truly YAGNI territory.)



More information about the Python-list mailing list