[Python-Dev] Counter proposal: multidict
ianb at colorstudy.com
Fri Feb 17 21:51:26 CET 2006
Guido van Rossum wrote:
> On 2/17/06, Ian Bicking <ianb at colorstudy.com> wrote:
>>I really don't like that defaultdict (or a dict extension) means that
>>x[not_found] will have noticeable side effects. This all seems to be a
>>roundabout way to address one important use case of a dictionary with
>>multiple values for each key, and in the process breaking an important
>>quality of good Python code, that attribute and getitem access not have
>>noticeable side effects.
>>So, here's a proposed interface for a new multidict object, borrowing
>>some methods from Set but mostly from dict. Some things that seemed
>>particularly questionable to me are marked with ??.
> Have you seen my revised proposal (which is indeed an addition to the
> standard dict rather than a subclass)?
Yes, and though it is more general it has the same issue of side
effects. Doesn't it seem strange that getting an item will change the
values of .keys(), .items(), and .has_key()?
> Your multidict addresses only one use case for the proposed behavior;
> what's so special about dicts of lists that they should have special
> support? What about dicts of dicts, dicts of sets, dicts of
> user-defined objects?
What's so special? 95% (probably more!) of current use of .setdefault()
is .setdefault(key, ).append(value).
Also, since when do features have to address all possible cases?
Certainly there are other cases, and I think they can be answered with
other classes. Here are some current options:
.setdefault() -- works with any subtype; slightly less efficient than
what you propose. Awkward to read; doesn't communicate intent very well.
UserDict -- works for a few cases where you want to make dict-like
objects. Messes up the concept of identity and containment -- resulting
objects both "are" dictionaries, and "contain" a dictionary (obj.data).
DictMixin -- does anything you can possibly want, requiring only the
overriding of a couple methods.
dict subclassing -- does anything you want as well, but you typically
have to override many more methods than with DictMixin (and if you don't
have to override every method, that's not documented in any way). Isn't
written with subclassing in mind. Really, you are proposing that one
specific kind of override be made feasible, either with subclassing or
injecting a method.
That said, I'm not saying that several kinds of behavior shouldn't be
supported. I just don't see why dict should support them all (or
multidict). And I also think dict will support them poorly.
multidict implements one behavior *well*. In a documented way, with a
name people can refer to. I can say "multidict", I can't say "a dict
where I set default_factory to list" (well, I can say that, but that
just opens up yet more questions and clarifications).
Some ways multidict differs from default_factory=list:
* __contains__ works (you have to use .get() with default_factory to get
a meaningful result)
* Barring cases where there are exceptions, x[key] and x.get(key) return
the same value for multidict; with default_factory one returns  and
the other returns None when the key isn't found. But if you do x[key];
x.get(key) then x.get(key) always returns .
* You can't use __setitem__ to put non-list items into a multidict; with
multidict you don't have to guard against non-sequences values.
*  is meaningful not just as the default value, but as a null value;
the multidict implementation respects both aspects.
* Specific method x.add(key, value) that indicates intent in a way that
x[key].append(value) does not.
* items and iteritems return values meaningful to the context (a list of
(key, single_value) -- this is usually what I want, and avoids a nested
for loop). __len__ also usefully different than in dict.
* .update() handles iteritems sensibly, and updates from dictionaries
sensibly -- if you mix a default_factory=list dict with a "normal"
(single-value) dictionary you'll get an effectively corrupted dictionary
(where some keys are lists)
* x.getfirst(key) is useful
* I think this will be much easier to reason about in situations with
threads -- dict acts very predictably with threads, and people rely upon
* multidict can be written either with subclassing intended, or with an
abstract superclass, so that other kinds of specializations of this
superset of the dict interface can be made more easily (if DictMixin
itself isn't already sufficient)
So, I'm saying: multidict handles one very common collection need that
dict handles awkwardly now. multidict is a meaningful and useful class
with its own identity/name and meaning separate from dict, and has
methods that represent both the intersection and the difference between
the two classes. multidict does not in any way preclude other
collection objects for other situations; it is entirely unfair to expect
a new class to solve all issues. multidict suggests an interface that
other related classes can use (e.g., an ordered version). multidict,
unlike default_factory, is not just a recipe for creating a specific and
commonly needed object, it is a class for creating it.
Ian Bicking / ianb at colorstudy.com / http://blog.ianbicking.org
More information about the Python-Dev