efficient interval containment lookup

Per Freem perfreem at yahoo.com
Mon Jan 12 23:55:36 EST 2009


On Jan 12, 10:58 pm, Steven D'Aprano
<ste... at REMOVE.THIS.cybersource.com.au> wrote:
> On Mon, 12 Jan 2009 14:49:43 -0800, Per Freem wrote:
> > thanks for your replies -- a few clarifications and questions. the
> > is_within operation is containment, i.e. (a,b) is within (c,d) iff a
> >>= c and b <= d. Note that I am not looking for intervals that
> > overlap... this is why interval trees seem to me to not be relevant, as
> > the overlapping interval problem is way harder than what I am trying to
> > do. Please correct me if I'm wrong on this...
>
> To test for contained intervals:
> a >= c and b <= d
>
> To test for overlapping intervals:
>
> not (b < c or a > d)
>
> Not exactly what I would call "way harder".
>
> --
> Steven

hi Steven,

i found an implementation (which is exactly how i'd write it based on
the description) here: http://hackmap.blogspot.com/2008/11/python-interval-tree.html

when i use this however, it comes out either significantly slower or
equal to a naive search. my naive search just iterates through a
smallish list of intervals and for each one says whether they overlap
with each of a large set of intervals.

here is the exact code i used to make the comparison, plus the code at
the link i have above:

class Interval():
    def __init__(self, start, stop):
        self.start = start
        self.stop = stop

import random
import time
num_ints = 30000
init_intervals = []
for n in range(0,
num_ints):
    start = int(round(random.random()
*1000))
    end = start + int(round(random.random()*500+1))
    init_intervals.append(Interval(start, end))
num_ranges = 900
ranges = []
for n in range(0, num_ranges):
  start = int(round(random.random()
*1000))
  end = start + int(round(random.random()*500+1))
  ranges.append((start, end))
#print init_intervals
tree = IntervalTree(init_intervals)
t1 = time.time()
for r in ranges:
  tree.find(r[0], r[1])
t2 = time.time()
print "interval tree: %.3f" %((t2-t1)*1000.0)
t1 = time.time()
for r in ranges:
  naive_find(init_intervals, r[0], r[1])
t2 = time.time()
print "brute force: %.3f" %((t2-t1)*1000.0)

on one run, i get:
interval tree: 8584.682
brute force: 8201.644

is there anything wrong with this implementation? it seems very right
to me but i am no expert. any help on this would be relly helpful.



More information about the Python-list mailing list