efficient interval containment lookup

Per Freem perfreem at yahoo.com
Mon Jan 12 14:36:47 EST 2009


hello,

suppose I have two lists of intervals, one significantly larger than
the other.
For example listA = [(10, 30), (5, 25), (100, 200), ...] might contain
thousands
of elements while listB (of the same form) might contain hundreds of
thousands
or millions of elements.
I want to count how many intervals in listB are contained within every
listA. For example, if listA = [(10, 30), (600, 800)] and listB =
[(20, 25), (12, 18)] is the input, then the output should be that (10,
30) has 2 intervals from listB contained within it, while (600, 800)
has 0. (Elements of listB can be contained within many intervals in
listA, not just one.)

What is an efficient way to this?  One simple way is:

for a_range in listA:
  for b_range in listB:
    is_within(b_range, a_range):
      # accumulate a counter here

where is_within simply checks if the first argument is within the
second.

I'm not sure if it's more efficient to have the iteration over listA
be on the outside or listB.  But perhaps there's a way to index this
that makes things more efficient?  I.e. a smart way of indexing listA
such that I can instantly get all of its elements that are within some
element
of listB, maybe?  Something like a hash, where this look up can be
close to constant time rather than an iteration over all lists... if
there's any built-in library functions that can help in this it would
be great.

any suggestions on this would be awesome. thank you.



More information about the Python-list mailing list