Why defaultdict?
Thomas Jollans
thomas at jollans.com
Fri Jul 2 05:20:48 EDT 2010
On 07/02/2010 06:11 AM, Steven D'Aprano wrote:
> I would like to better understand some of the design choices made in
> collections.defaultdict.
>
> Firstly, to initialise a defaultdict, you do this:
>
> from collections import defaultdict
> d = defaultdict(callable, *args)
>
> which sets an attribute of d "default_factory" which is called on key
> lookups when the key is missing. If callable is None, defaultdicts are
> *exactly* equivalent to built-in dicts, so I wonder why the API wasn't
> added on to dict rather than a separate class that needed to be imported.
> That is:
>
> d = dict(*args)
> d.default_factory = callable
That's just not what dicts, a very simple and elementary data type, do.
I know this isn't really a good reason. In addition to what Chris said,
I expect this would complicate the dict code a great deal.
>
> If you failed to explicitly set the dict's default_factory, it would
> behave precisely as dicts do now. So why create a new class that needs to
> be imported, rather than just add the functionality to dict?
>
> Is it just an aesthetic choice to support passing the factory function as
> the first argument? I would think that the advantage of having it built-
> in would far outweigh the cost of an explicit attribute assignment.
>
The cost of this feature would be over-complication of the built-in dict
type when a subclass would do just as well
>
>
> Second, why is the factory function not called with key? There are three
> obvious kinds of "default values" a dict might want, in order of more-to-
> less general:
>
> (1) The default value depends on the key in some way: return factory(key)
I agree, this is a strange choice. However, nothing's stopping you from
being a bit verbose about what you want and just doing it:
class mydict(defaultdict):
def __missing__(self, key):
# ...
the __missing__ method is really the more useful bit the defaultdict
class adds, by the looks of it.
-- Thomas
> (2) The default value doesn't depend on the key: return factory()
> (3) The default value is a constant: return C
>
> defaultdict supports (2) and (3):
>
> defaultdict(factory, *args)
> defaultdict(lambda: C, *args)
>
> but it doesn't support (1). If key were passed to the factory function,
> it would be easy to support all three use-cases, at the cost of a
> slightly more complex factory function. E.g. the current idiom:
>
> defaultdict(factory, *args)
>
> would become:
>
> defaultdict(lambda key: factory(), *args)
>
>
> (There is a zeroth case as well, where the default value depends on the
> key and what else is in the dict: factory(d, key). But I suspect that's
> well and truly YAGNI territory.)
More information about the Python-list
mailing list