Clustering the keys of a dict according to its values

Fri Nov 14 08:12:09 EST 2008

Florian Brucker <t... at torfbold.com> wrote:
> Given a dictionary, I want to create a clustered version of it,
> collecting keys that have the same value:
>
>  >>> d = {'a':1, 'b':2, 'c':1, 'd':1, 'e':2, 'f':3}
>  >>> cluster(d)
> {1:['a', 'c', 'd'], 2:['b', 'e'], 3:['f']}
>
> That is, generate a new dict which holds for each value of the old dict
> a list of the keys of the old dict that have that very value.

>>> from collections import defaultdict
>>> def cluster(d):
...     if not hasattr(d, 'iteritems'):
...         d = dict((x,y) for x,y in enumerate(d))
...     cluster = defaultdict(list)
...     for key, value in d.iteritems():
...         cluster[value].append(key)
...     return cluster
...
>>> d1 = {'a':1, 'b':2, 'c':1, 'd':1, 'e':2, 'f':3}
>>> d2 = ['a','a','b','c','b','d','d']
>>> cluster(d1)
defaultdict(<type 'list'>, {1: ['a', 'c', 'd'], 2: ['b', 'e'], 3:
['f']})
>>> cluster(d2)
defaultdict(<type 'list'>, {'a': [0, 1], 'c': [3], 'b': [2, 4], 'd':
[5, 6]})

defaultdicts act pretty much exactly like dicts, except for the
default-value-for-everything behaviour, but if you'd rather have pure
dicts, just coerce with a dict(cluster) at the end of the function.

My version converts lists to dicts so the same core behaviour will
work. At worst it'll step through a sequence twice, but if it's passed
a dict it'll only do it once, unlike yours' which steps through each
twice no matter what (in order to obtain the unique values you
require).

A side note: if you're using Python2.4+, you can replace your unique()
function with the built-in set().