[Python-Dev] Proposal: defaultdict

Ian Bicking ianb at colorstudy.com
Fri Feb 17 18:58:30 CET 2006

Raymond Hettinger wrote:
>>>Over lunch with Alex Martelli, he proposed that a subclass of dict
>>>with this behavior (but implemented in C) would be a good addition to
>>>the language
> I would like to add something like this to the collections module, but a PEP is 
> probably needed to deal with issues like:
> * implications of a __getitem__ succeeding while get(value, x) returns x 
> (possibly different from the overall default)
> * implications of a __getitem__ succeeding while __contains__ would fail
> * whether to add this to the collections module (I would say yes)
> * whether to allow default functions as well as default values (so you could 
> instantiate a new default list)
> * comparing all the existing recipes and third-party modules that have already 
> done this
> * evaluating its fitness for common use cases (i.e. bags and dict of lists).

It doesn't seem that useful for bags, assuming we're talking about an 
{object: count} implementation of bags; bags should really have a more 
set-like interface than a dict-like interface.

A dict of lists typically means a multi-valued dict.  In that case it 
seems like x[key_not_found] should return the empty list, as that means 
zero values; even though zero values also means that 
x.has_key(key_not_found) should return False as well.  *but* getting 
x[key_not_found] does not (for a multi-valued dict) mean that suddently 
has_key should return true.  I find the side-effect nature of 
__getitem__ as proposed in default_dict to be rather confusing, and when 
reading code it will very much break my expectations.  I assume that 
attribute access and [] access will not have side effects.  Coming at it 
from that direction, I'm -1, though I'm +1 on dealing with the specific 
use case that started this (x.setdefault(key, []).append(value)).

An implementation targetted specifically at multi-valued dictionaries 
seems like it would be better.  Incidentally, on Web-SIG we've discussed 
wsgiref, and it includes a mutli-values, ordered, case-insensitive 
dictionary.  Such a dictionary(ish) object has clear applicability for 
HTTP headers, but certainly it is something I've used many times 
elsewhere.  In a case-sensitive form it applies to URL variables. 
Really there's several combinations of features, each with different uses.

So we have now...

dicts: unordered, key:value (associative), single-value
sets: unordered, not key:value, single-value
lists: ordered, not key:value, multi-value

We don't have...

bags: unordered, not key:value, multi-value
multi-dict: unordered, key:value, multi-value
ordered-dict: ordered, key:value, single-value
ordered-multi-dict: ordered, key:value, single-value

For all key:value collections, normalized keys can be useful.  (Though 
notably the wsgiref Headers object does not have normalized keys, but 
instead does case-insensitive comparisons.)  I don't know where 
dict-of-dict best fits in here.

Ian Bicking  /  ianb at colorstudy.com  /  http://blog.ianbicking.org

More information about the Python-Dev mailing list