Overlap in python
kj
no.email at please.post
Tue Aug 4 17:57:25 EDT 2009
In <bf29ddcb-347f-49ca-a1ef-66cd5fae4e02 at z4g2000prh.googlegroups.com> Jay Bird <jay.bird0804 at gmail.com> writes:
>Hi everyone,
>I've been trying to figure out a simple algorithm on how to combine a
>list of parts that have 1D locations that overlap into a non-
>overlapping list. For example, here would be my input:
>part name location
>a 5-9
>b 7-10
>c 3-6
>d 15-20
>e 18-23
>And here is what I need for an output:
>part name location
>c.a.b 3-10
>d.e 15-23
>I've tried various methods, which all fail. Does anyone have an idea
>how to do this?
This problem is basically isomorphic to the problem of finding
connected components in an undirected graph. If your parts are
nodes in a graph, there's an edge between them if they overlap.
The following is probably overkill, but it works. It uses the
module pygraph, which you can download from
http://code.google.com/p/python-graph/
The real work is done by its connected_components function.
import pygraph
from pygraph.algorithms.accessibility import connected_components
class Part(object):
def __init__(self, name, start, finish):
self.name = name
if start >= finish:
raise ValueError("start must be strictly less than finish")
self.start = start
self.finish = finish
def __str__(self):
return self.name
def combine_parts(parts):
gr = pygraph.graph()
gr.add_nodes(parts)
# connect the nodes
for i in range(len(parts) - 1):
part_i = parts[i]
for j in range(i + 1, len(parts)):
part_j = parts[j]
if (part_j.start <= part_i.finish <= part_j.finish or
part_i.start <= part_j.finish <= part_i.finish):
gr.add_edge(part_i, part_j)
cc = connected_components(gr)
c2n = {}
# build connected components as lists
for k, v in cc.items():
c2n.setdefault(v, []).append(k)
ret = []
for nodes in c2n.values():
nodes.sort(key=lambda x: x.start)
name = '.'.join(x.name for x in nodes)
start = min(x.start for x in nodes)
finish = max(x.finish for x in nodes)
ret.append((name, start, finish))
return ret
if __name__ == '__main__':
data = [Part('a', 5, 9),
Part('b', 7, 10),
Part('c', 3, 6),
Part('d', 15, 20),
Part('e', 18, 23)]
print combine_parts(data)
When you run the script above, the output is
[('c.a.b', 3, 10), ('d.e', 15, 23)]
HTH,
kynn
More information about the Python-list
mailing list