Grouping pairs - suggested tools

Peter Otten __peter__ at web.de
Tue Sep 21 06:13:56 EDT 2010


Astley Le Jasper wrote:

> I have a list of tuples that indicate a relationship, ie a is related
> to b, b is related to c etc etc. What I want to do is cluster these
> relationships into groups.  An item will only be associated with a
> single cluster.
> 
> Before I started, I wondered if there was any particular tool within
> Python I should be looking at. I don't expect anyone to code this for
> me, just say ... "you need to look at using x". I was going to use
> populate a dictionary and
> 
> Sorry for being so vague.
> 
> Example Data:
> [(a,b)
> (a,c)
> (a,d)
> (b,c)
> (b,d)
> (c,d)
> (e,f)
> (e,g)
> (f,g)
> (h,i)]
> 
> Output (grouping id, item ref)
> (1,a),
> (1,b),
> (1,c),
> (1,d),
> (2,e),
> (2,f),
> (2,g),
> (3,h),
> (3,i)

A straightforward implementation:

$ cat group_edges.py
def find_groups(edges):
    lookup = {} # node --> group
    groups = {} # id(group) --> group
    for a, b in edges:               
        if a in lookup:              
            if b in lookup:          
                ga = lookup[a]       
                gb = lookup[b]       
                if ga is not gb:     
                    # merge groups   
                    if len(ga) < len(gb):
                        # always update the smaller group
                        ga, gb = gb, ga                  
                    del groups[id(gb)]                   
                    ga.update(gb)                        
                    for k in gb:                         
                        lookup[k] = ga                   
            else:                                        
                # add b to a's group                     
                ga = lookup[a]
                ga.add(b)
                lookup[b] = ga
        elif b in lookup:
            # add a to b's group
            gb = lookup[b]
            gb.add(a)
            lookup[a] = gb
        else:
            # create new group
            g = set((a, b))
            groups[id(g)] = g
            lookup[a] = lookup[b] = g
    return groups.values()

def show_groups(edges):
    groups = sorted(sorted(g) for g in find_groups(edges))
    for i, group in enumerate(groups, 1):
        for node in sorted(group):
            print i, node

if __name__ == "__main__":
    edges = [
    ('a', 'b'),
    ('a', 'c'),
    ('a', 'd'),
    ('b', 'c'),
    ('b', 'd'),
    ('c', 'd'),
    ('e', 'f'),
    ('e', 'g'),
    ('f', 'g'),
    ('h', 'i'),
    ("lonesome john", "lonesome john")]

    show_groups(edges)
    print

    show_groups([('a', 'c'), ('b', 'd'), ('c', 'd')])

$ python group_edges.py
1 a
1 b
1 c
1 d
2 e
2 f
2 g
3 h
3 i
4 lonesome john

1 a
1 b
1 c
1 d

Untested.

Peter



More information about the Python-list mailing list