Overlap in python

kj no.email at please.post
Tue Aug 4 17:57:25 EDT 2009


In <bf29ddcb-347f-49ca-a1ef-66cd5fae4e02 at z4g2000prh.googlegroups.com> Jay Bird <jay.bird0804 at gmail.com> writes:

>Hi everyone,

>I've been trying to figure out a simple algorithm on how to combine a
>list of parts that have 1D locations that overlap into a non-
>overlapping list.  For example, here would be my input:

>part name   location
>a                  5-9
>b                  7-10
>c                  3-6
>d                  15-20
>e                  18-23

>And here is what I need for an output:
>part name   location
>c.a.b            3-10
>d.e               15-23

>I've tried various methods, which all fail.  Does anyone have an idea
>how to do this?

This problem is basically isomorphic to the problem of finding
connected components in an undirected graph.  If your parts are
nodes in a graph, there's an edge between them if they overlap.

The following is probably overkill, but it works.  It uses the
module pygraph, which you can download from

http://code.google.com/p/python-graph/

The real work is done by its connected_components function.


import pygraph
from pygraph.algorithms.accessibility import connected_components

class Part(object):
    def __init__(self, name, start, finish):
        self.name = name
        if start >= finish:
            raise ValueError("start must be strictly less than finish")
        self.start = start
        self.finish = finish
    def __str__(self):
        return self.name

def combine_parts(parts):
    gr = pygraph.graph()
    gr.add_nodes(parts)
    # connect the nodes
    for i in range(len(parts) - 1):
        part_i = parts[i]
        for j in range(i + 1, len(parts)):
            part_j = parts[j]
            if (part_j.start <= part_i.finish <= part_j.finish or
                part_i.start <= part_j.finish <= part_i.finish):
                gr.add_edge(part_i, part_j)

    cc = connected_components(gr)
    c2n = {}
    # build connected components as lists
    for k, v in cc.items():
        c2n.setdefault(v, []).append(k)

    ret = []
    for nodes in c2n.values():
        nodes.sort(key=lambda x: x.start)
        name = '.'.join(x.name for x in nodes)
        start = min(x.start for x in nodes)
        finish = max(x.finish for x in nodes)
        ret.append((name, start, finish))

    return ret

if __name__ == '__main__':
    data = [Part('a', 5, 9),
            Part('b', 7, 10),
            Part('c', 3, 6),
            Part('d', 15, 20),
            Part('e', 18, 23)]
    print combine_parts(data)


When you run the script above, the output is

[('c.a.b', 3, 10), ('d.e', 15, 23)]

HTH,

kynn



More information about the Python-list mailing list