Grouping pairs - suggested tools
Peter Otten
__peter__ at web.de
Tue Sep 21 12:13:56 CEST 2010
Astley Le Jasper wrote:
> I have a list of tuples that indicate a relationship, ie a is related
> to b, b is related to c etc etc. What I want to do is cluster these
> relationships into groups. An item will only be associated with a
> single cluster.
>
> Before I started, I wondered if there was any particular tool within
> Python I should be looking at. I don't expect anyone to code this for
> me, just say ... "you need to look at using x". I was going to use
> populate a dictionary and
>
> Sorry for being so vague.
>
> Example Data:
> [(a,b)
> (a,c)
> (a,d)
> (b,c)
> (b,d)
> (c,d)
> (e,f)
> (e,g)
> (f,g)
> (h,i)]
>
> Output (grouping id, item ref)
> (1,a),
> (1,b),
> (1,c),
> (1,d),
> (2,e),
> (2,f),
> (2,g),
> (3,h),
> (3,i)
A straightforward implementation:
$ cat group_edges.py
def find_groups(edges):
lookup = {} # node --> group
groups = {} # id(group) --> group
for a, b in edges:
if a in lookup:
if b in lookup:
ga = lookup[a]
gb = lookup[b]
if ga is not gb:
# merge groups
if len(ga) < len(gb):
# always update the smaller group
ga, gb = gb, ga
del groups[id(gb)]
ga.update(gb)
for k in gb:
lookup[k] = ga
else:
# add b to a's group
ga = lookup[a]
ga.add(b)
lookup[b] = ga
elif b in lookup:
# add a to b's group
gb = lookup[b]
gb.add(a)
lookup[a] = gb
else:
# create new group
g = set((a, b))
groups[id(g)] = g
lookup[a] = lookup[b] = g
return groups.values()
def show_groups(edges):
groups = sorted(sorted(g) for g in find_groups(edges))
for i, group in enumerate(groups, 1):
for node in sorted(group):
print i, node
if __name__ == "__main__":
edges = [
('a', 'b'),
('a', 'c'),
('a', 'd'),
('b', 'c'),
('b', 'd'),
('c', 'd'),
('e', 'f'),
('e', 'g'),
('f', 'g'),
('h', 'i'),
("lonesome john", "lonesome john")]
show_groups(edges)
print
show_groups([('a', 'c'), ('b', 'd'), ('c', 'd')])
$ python group_edges.py
1 a
1 b
1 c
1 d
2 e
2 f
2 g
3 h
3 i
4 lonesome john
1 a
1 b
1 c
1 d
Untested.
Peter
More information about the Python-list
mailing list