Guido van Rossum wrote:
On 2/17/06, Ian Bicking
wrote: I really don't like that defaultdict (or a dict extension) means that x[not_found] will have noticeable side effects. This all seems to be a roundabout way to address one important use case of a dictionary with multiple values for each key, and in the process breaking an important quality of good Python code, that attribute and getitem access not have noticeable side effects.
So, here's a proposed interface for a new multidict object, borrowing some methods from Set but mostly from dict. Some things that seemed particularly questionable to me are marked with ??.
Have you seen my revised proposal (which is indeed an addition to the standard dict rather than a subclass)?
Yes, and though it is more general it has the same issue of side effects. Doesn't it seem strange that getting an item will change the values of .keys(), .items(), and .has_key()?
Your multidict addresses only one use case for the proposed behavior; what's so special about dicts of lists that they should have special support? What about dicts of dicts, dicts of sets, dicts of user-defined objects?
What's so special? 95% (probably more!) of current use of .setdefault() is .setdefault(key, []).append(value). Also, since when do features have to address all possible cases? Certainly there are other cases, and I think they can be answered with other classes. Here are some current options: .setdefault() -- works with any subtype; slightly less efficient than what you propose. Awkward to read; doesn't communicate intent very well. UserDict -- works for a few cases where you want to make dict-like objects. Messes up the concept of identity and containment -- resulting objects both "are" dictionaries, and "contain" a dictionary (obj.data). DictMixin -- does anything you can possibly want, requiring only the overriding of a couple methods. dict subclassing -- does anything you want as well, but you typically have to override many more methods than with DictMixin (and if you don't have to override every method, that's not documented in any way). Isn't written with subclassing in mind. Really, you are proposing that one specific kind of override be made feasible, either with subclassing or injecting a method. That said, I'm not saying that several kinds of behavior shouldn't be supported. I just don't see why dict should support them all (or multidict). And I also think dict will support them poorly. multidict implements one behavior *well*. In a documented way, with a name people can refer to. I can say "multidict", I can't say "a dict where I set default_factory to list" (well, I can say that, but that just opens up yet more questions and clarifications). Some ways multidict differs from default_factory=list: * __contains__ works (you have to use .get() with default_factory to get a meaningful result) * Barring cases where there are exceptions, x[key] and x.get(key) return the same value for multidict; with default_factory one returns [] and the other returns None when the key isn't found. But if you do x[key]; x.get(key) then x.get(key) always returns []. * You can't use __setitem__ to put non-list items into a multidict; with multidict you don't have to guard against non-sequences values. * [] is meaningful not just as the default value, but as a null value; the multidict implementation respects both aspects. * Specific method x.add(key, value) that indicates intent in a way that x[key].append(value) does not. * items and iteritems return values meaningful to the context (a list of (key, single_value) -- this is usually what I want, and avoids a nested for loop). __len__ also usefully different than in dict. * .update() handles iteritems sensibly, and updates from dictionaries sensibly -- if you mix a default_factory=list dict with a "normal" (single-value) dictionary you'll get an effectively corrupted dictionary (where some keys are lists) * x.getfirst(key) is useful * I think this will be much easier to reason about in situations with threads -- dict acts very predictably with threads, and people rely upon that * multidict can be written either with subclassing intended, or with an abstract superclass, so that other kinds of specializations of this superset of the dict interface can be made more easily (if DictMixin itself isn't already sufficient) So, I'm saying: multidict handles one very common collection need that dict handles awkwardly now. multidict is a meaningful and useful class with its own identity/name and meaning separate from dict, and has methods that represent both the intersection and the difference between the two classes. multidict does not in any way preclude other collection objects for other situations; it is entirely unfair to expect a new class to solve all issues. multidict suggests an interface that other related classes can use (e.g., an ordered version). multidict, unlike default_factory, is not just a recipe for creating a specific and commonly needed object, it is a class for creating it. -- Ian Bicking / ianb@colorstudy.com / http://blog.ianbicking.org