[Python-3000] Thoughts on dictionary views
Raymond Hettinger
raymond.hettinger at verizon.net
Tue Feb 20 09:55:57 CET 2007
The Java concept of dictionary views seems to have caught-on here while I wasn't
looking. At the risk of covering some old ground, I would like to re-open the
question. Here are a few thoughts on the subject to kick-off the discussion:
* Maintaining a live (self-updating) view is a bit tricky from an implementation
point-of-view. While it is clearly doable for dictionaries, it is not clear
that it is a good idea for a general mapping API which can be wrapped around
dbms, shelves, elementtrees, b-trees, and other wrascally rabbits. I doubt that
the underlying structures of other mapping types support the observer pattern
necessary to keep views updated -- this is doubly true if the underlying data is
on disk and can be updated by other processes, threads, etc.
* One of the purported benefits is to provide set-like behavior without the
expense of copying to a new set object. FWIW, I've updated the set
implementation to be more interoperable with dictionaries so that the conversion
costs are negligible (about the same as a dict resize operation -- one pass, no
calls to PyObject_Hash, insertion into a presized, sparse table with very few
collisions).
* A dict is also one of Python's most basic APIs (along with lists). Ideally,
we should keep those two APIs as simple as possible (getting rid of setdefault()
and unneeded methods is a step in the right direction). IMO, the views will be
the hardest part of the API to explain and interact with when learning the
language -- to learn about dicts and lists, you already have to learn about
mutability and hashability -- it doesn't help this situation if you then need to
learn about self-updating views that can be deleted, have modified values, but
cannot be added, and that have their own set-like operations but aren't really
sets . . .
* ISTM that views offer three benefits: re-iterability, set behavior, and
self-updates. IMO, the first is not commonly needed and is trivially served by
writing list(mydict.items()) or somesuch. The second is best served by an
explicit conversion to a set or frozenset type -- those two types have been
enormously successful in that they seem to offer a near zero learning curve --
people seem to intuitively know how to use them right out of the box. As long
as that conversion is fast, I think the explicit conversion is the way to go --
it is the way you would do it with any other Python type where you wanted set
behavior. Adding a handful of set methods to dict views would only complicate
an otherwise simple situation and introduce unnecessary complexity (i.e. what
should isinstance(d.d_keys, set) return?). The third benefit (self-updates) is
more interesting and does not have a direct analog with existing python tools,
so the question is how valuable is self-updating behavior and are there
compelling use cases that warrant a more complex API?
My recommendation is to take a more conservative route. Let's make dicts as
simple as possible and then introduce a new collections module entry with the
views bells and whistles. If the collections version proves itself as
enormously popular, useful, understandable, and without a good equivalent, then
it can ask for a promotion. The collections module is nice place to put in
alternate datatypes that meet the more demanding needs of advanced users who
know exactly what they want/need in terms of special behaviors or performance.
And, if we take the collections module route, there is no reason that it cannot
be put into Py2.6 where people will either flock to it or ignore it, with either
result providing us with good guidance for Py3.0.
my-two-cents,
Raymond
More information about the Python-3000
mailing list