Overlap in python
Bearophile
bearophileHUGS at lycos.com
Tue Aug 4 16:54:16 EDT 2009
Jay Bird:
> I've been trying to figure out a simple algorithm on how to combine a
> list of parts that have 1D locations that overlap into a non-
> overlapping list. For example, here would be my input:
>
> part name location
> a 5-9
> b 7-10
> c 3-6
> d 15-20
> e 18-23
>
> And here is what I need for an output:
> part name location
> c.a.b 3-10
> d.e 15-23
That's sometimes known as "The Set Union Problem on Intervals".
Knowing the name of a problem is sometimes half the work :-) People
have written papers on this.
This is a O(N^2) force-brute algorithm, you can try it on your data:
data = """part name location
a 5-9
b 7-10
c 3-6
d 15-20
e 18-23"""
pairs = map(str.split, data.splitlines()[1:])
triples = [map(int, i.split("-")) + [name] for (name, i) in pairs]
overlap = lambda x1,y1, x2,y2: y1 >= x2 and x1 <= y2
results = []
for (x1,y1,name) in triples:
for i, res in enumerate(results):
if overlap(x1,y1, res[0],res[1]):
results[i].append(name)
results[i][0] = min(x1, res[0])
results[i][1] = max(y1, res[1])
break
else:
results.append([x1, y1, name])
print results
# Output: [[3, 10, 'a', 'b', 'c'], [15, 23, 'd', 'e']]
If you have have a lot of data then it will be too much slow, and you
have to use a more complex algorithm.
If your intervals are many, they are small, and they are spread in a
not too much large range of possible values, you can create an integer
array of indexes, and you can "paint" intervals on it.
Bye,
bearophile
More information about the Python-list
mailing list