Case-insensitive dict, non-destructive, fast, anyone?

Raymond Hettinger vze4rx4y at verizon.net
Fri Apr 1 18:04:42 EST 2005


[Bengt Richter]
> I wonder if a dict with a general override hook for hashing all keys would be
useful.
> E.g.,  a dict.__keyhash__ that would take key as arg and default as now
returning key.__hash__()
> but that you could override. Seems like this could potentially be more
efficient than key wrappers,
> and would also make it so you wouldn't have to chase all the affected methods
in doing an idict
> like the above (e.g., get, setdefault, update etc. etc.)

There has also been a discussion of adding a seqdict that maintains a keys in
insertion order.  Another idea was to have a defaultdict that could be
instructed to create values as necessary (either zero for counting or [] for
list building).  Putting the three ideas together, perhaps we should write an
extension module with a custom dictionary type with methods/attributes
implementing those ideas.  Essentially, the thought is to put all the bells and
whistles in one place.  If the extension became popular, it could ultimately
wend its way into the collections module.

A facetious naming idea would be to call it TrickyDict, a relative to DictMixin
and no relation to a former U.S. president ;-)

t = TrickyDict()
t.keytransform(str.lower)
t['abC'] = 1                  # case insensitive dictionary
assert t['Abc'] == 1

t = TrickyDict()
t.setdefaultvalue(0)
t['x'] += 1                     # very bag like
assert t['x'] == 1

t = TrickyDict()
t.setdefaultfunction(list)
t['x'] += ['first']          # rather like: t.setdefault('x',
[]).append('first')
t['x'] += ['second']
assert t == {'x': ['first', 'second']}

t = TrickyDict()
t.sortedkeys = True
t['x'] = 1
t['y'] = 2
assert t.keys() == ['x', 'y']   # order is guaranteed

This universal dictionary type could also incorporate some methods for other
recurrent themes such as a inverting a dictionary:

    def invdict(self)
        return self: dict((v,k) for k,v in self.iteritems()).

For the performance minded, there could also be a control for dictionary
speed/space efficiency:

t = TrickyDict()
t.maxkeydensity = 40          # resize whenever the dictionary is more than 40%
full

Taken together, these six attributes/methods could cover many wished for
features for the 10% of the cases where a regular dictionary doesn't provide the
best solution.


Raymond





More information about the Python-list mailing list