Overlap in python

Scott David Daniels Scott.Daniels at Acm.Org
Wed Aug 5 11:12:48 EDT 2009


Jay Bird wrote:
> Hi everyone,
> 
> I've been trying to figure out a simple algorithm on how to combine a
> list of parts that have 1D locations that overlap into a non-
> overlapping list.  For example, here would be my input:
> 
> part name   location
> a                  5-9
> b                  7-10
> c                  3-6
> d                  15-20
> e                  18-23
> 
> And here is what I need for an output:
> part name   location
> c.a.b            3-10
> d.e               15-23
> 
> I've tried various methods, which all fail.  Does anyone have an idea
> how to do this?
> 
> Thank you very much!
> Jay

I once had to do this for finding nested block structure.
The key for me was a sort order:  start, -final.

Having not seen it here (though I looked a bit), here's one:

class Entry(object):
     '''An entry is a name and range'''
     def __init__(self, line):
         self.name, startstop = line.split()
         start, stop = startstop.split('-')
         self.start, self.stop = int(start), int(stop)


def combined_ranges(lines):
     '''Create Entries in "magic order", and produce ranges.

     The "magic order" makes least element with longest range first, so
     overlaps show up in head order, with final tail first among equals.
     '''
     # Fill in our table (ignoring blank lines), then sort by magic order
     elements = [Entry(line) for line in lines if line.strip()]
     elements.sort(key=lambda e: (e.start, -e.stop))

     # Now produce resolved ranges.  Grab the start
     gen = iter(elements)
     first = gen.next()

     # For the remainder, combine or produce
     for v in gen:
         if v.start <= first.stop:
             # on overlap, merge in new element (may update stop)
             first.name += '.' + v.name
             if first.stop < v.stop:
                 first.stop = v.stop
         else:
             yield first
             first = v
     # And now produce the last element we covered
     yield first

# Demo:
sample = '''part name   location
     a                  5-9
     b                  7-10
     c                  3-6
     d                  15-20
     e                  18-23
     '''
source = iter(sample.split('\n')) # source of lines, opened file?
ignored = source.next() # discard heading
for interval in combined_range(source):
     print '%s  %s-%s' % (interval.name, interval.start, interval.stop)

Prints:
c.a.b  3-10
d.e  15-23


--Scott David Daniels
Scott.Daniels at Acm.Org



More information about the Python-list mailing list