Overlap in python

Scott David Daniels Scott.Daniels at Acm.Org
Wed Aug 5 17:12:48 CEST 2009

```Jay Bird wrote:
> Hi everyone,
>
> I've been trying to figure out a simple algorithm on how to combine a
> list of parts that have 1D locations that overlap into a non-
> overlapping list.  For example, here would be my input:
>
> part name   location
> a                  5-9
> b                  7-10
> c                  3-6
> d                  15-20
> e                  18-23
>
> And here is what I need for an output:
> part name   location
> c.a.b            3-10
> d.e               15-23
>
> I've tried various methods, which all fail.  Does anyone have an idea
> how to do this?
>
> Thank you very much!
> Jay

I once had to do this for finding nested block structure.
The key for me was a sort order:  start, -final.

Having not seen it here (though I looked a bit), here's one:

class Entry(object):
'''An entry is a name and range'''
def __init__(self, line):
self.name, startstop = line.split()
start, stop = startstop.split('-')
self.start, self.stop = int(start), int(stop)

def combined_ranges(lines):
'''Create Entries in "magic order", and produce ranges.

The "magic order" makes least element with longest range first, so
overlaps show up in head order, with final tail first among equals.
'''
# Fill in our table (ignoring blank lines), then sort by magic order
elements = [Entry(line) for line in lines if line.strip()]
elements.sort(key=lambda e: (e.start, -e.stop))

# Now produce resolved ranges.  Grab the start
gen = iter(elements)
first = gen.next()

# For the remainder, combine or produce
for v in gen:
if v.start <= first.stop:
# on overlap, merge in new element (may update stop)
first.name += '.' + v.name
if first.stop < v.stop:
first.stop = v.stop
else:
yield first
first = v
# And now produce the last element we covered
yield first

# Demo:
sample = '''part name   location
a                  5-9
b                  7-10
c                  3-6
d                  15-20
e                  18-23
'''
source = iter(sample.split('\n')) # source of lines, opened file?
for interval in combined_range(source):
print '%s  %s-%s' % (interval.name, interval.start, interval.stop)

Prints:
c.a.b  3-10
d.e  15-23

--Scott David Daniels
Scott.Daniels at Acm.Org

```