efficient interval containment lookup

Per Freem perfreem at yahoo.com
Tue Jan 13 05:58:03 CET 2009


i forgot to add, my naive_find is:

def naive_find(intervals, start, stop):
  results = []
  for interval in intervals:
    if interval.start >= start and interval.stop <= stop:
      results.append(interval)
  return results

On Jan 12, 11:55 pm, Per Freem <perfr... at yahoo.com> wrote:
> On Jan 12, 10:58 pm, Steven D'Aprano
>
>
>
> <ste... at REMOVE.THIS.cybersource.com.au> wrote:
> > On Mon, 12 Jan 2009 14:49:43 -0800, Per Freem wrote:
> > > thanks for your replies -- a few clarifications and questions. the
> > > is_within operation is containment, i.e. (a,b) is within (c,d) iff a
> > >>= c and b <= d. Note that I am not looking for intervals that
> > > overlap... this is why interval trees seem to me to not be relevant, as
> > > the overlapping interval problem is way harder than what I am trying to
> > > do. Please correct me if I'm wrong on this...
>
> > To test for contained intervals:
> > a >= c and b <= d
>
> > To test for overlapping intervals:
>
> > not (b < c or a > d)
>
> > Not exactly what I would call "way harder".
>
> > --
> > Steven
>
> hi Steven,
>
> i found an implementation (which is exactly how i'd write it based on
> the description) here:http://hackmap.blogspot.com/2008/11/python-interval-tree.html
>
> when i use this however, it comes out either significantly slower or
> equal to a naive search. my naive search just iterates through a
> smallish list of intervals and for each one says whether they overlap
> with each of a large set of intervals.
>
> here is the exact code i used to make the comparison, plus the code at
> the link i have above:
>
> class Interval():
>     def __init__(self, start, stop):
>         self.start = start
>         self.stop = stop
>
> import random
> import time
> num_ints = 30000
> init_intervals = []
> for n in range(0,
> num_ints):
>     start = int(round(random.random()
> *1000))
>     end = start + int(round(random.random()*500+1))
>     init_intervals.append(Interval(start, end))
> num_ranges = 900
> ranges = []
> for n in range(0, num_ranges):
>   start = int(round(random.random()
> *1000))
>   end = start + int(round(random.random()*500+1))
>   ranges.append((start, end))
> #print init_intervals
> tree = IntervalTree(init_intervals)
> t1 = time.time()
> for r in ranges:
>   tree.find(r[0], r[1])
> t2 = time.time()
> print "interval tree: %.3f" %((t2-t1)*1000.0)
> t1 = time.time()
> for r in ranges:
>   naive_find(init_intervals, r[0], r[1])
> t2 = time.time()
> print "brute force: %.3f" %((t2-t1)*1000.0)
>
> on one run, i get:
> interval tree: 8584.682
> brute force: 8201.644
>
> is there anything wrong with this implementation? it seems very right
> to me but i am no expert. any help on this would be relly helpful.




More information about the Python-list mailing list