[Tutor] Please help with comparision in range of numbers

Danny Yoo dyoo at hkn.eecs.berkeley.edu
Fri Sep 16 00:38:43 CEST 2005



> 1427021_s_at	chr7:102786983-102794499	(+)
> 1452285_a_at	chr7:102786983-102794499	(+)
> 1445553_at	chr7:102848766-102910961	(-)
> 1420925_at	chr7:102863841-102887410	(+)
> 1450041_a_at	chr7:102863841-102887410	(+)
> 1447553_x_at	chr7:102899711-102899995	(-)
> 1418478_at	chr7:102991513-103023463	(-)
> 1452892_at	chr7:103132162-103291925	(-)
> 1430157_at	chr7:103292083-103297768	(+)

[some text cut]

> I want to find such instances of jack and jill's that fall in the same
> ranges and they should be in opposite to each other.


Hi Srinivas,


Let's break this down into two separate problems:

    1.  Finding Ranges that overlap.
    2.  Given two Ranges, see if they're in opposite directions.

I'm going to ignore problem 2 at the moment.  *grin*


If we don't care too much about efficiency, the following should help you
solve problem 1.

######
class Range(object):
    def __init__(self, low, high):
        assert low <= high
        self.low, self.high = low, high

    def is_overlapping(self, other):
        """Returns true if this range overlaps the other range.
        The exact boolean condition is copied from the
        classic textbook "Introduction to Algorithms, 2nd edition",
        section 14.3.
        """
        return (self.low <= other.high) and other.low <= self.high
######


For example:

######
>>> Range(1, 2).is_overlapping(Range(2, 3))
True
>>> Range(1, 2).is_overlapping(Range(3, 3))
False
>>> Range(1, 2).is_overlapping(Range(-2, 0))
False
>>> Range(1, 2).is_overlapping(Range(-42, 42))
True
######

So if you have sequence of Ranges like this, you can brute force and just
loop across all possible pairs, and do this is_overlapping() check.  You
may want to test this on a small sample dataset just to see that it works.

This approach, however, is slow, and gets slower as the dataset grows.
If we do care about efficiency, we have to do something a little smarter.

Interval trees or K-d Tree data structures are examples of data structures
that can help you do range-intersection queries much more quickly than a
brute-force approach.  There's an implementation of K-d trees in
biopython:

    http://biopython.org/docs/api/public/Bio.KDTree-module.html

so you may want to investigate this with the Biopython folks if speed is a
concern.  John Bentley wrote a nice survey article talking about these
range-searching algorithms here:

    http://portal.acm.org/citation.cfm?id=356797

although you might need an ACM account to get at the article.



> Truely, I always flunk and my mind gets blackedout when I see these
> number ranges.  I do not know how to write : check "if a range of
> numbers fall in the other range of numbers" condition.

Yeah, me too.  Check out the book reference in the code above: it helps to
clear things up about intervals and range checking.



More information about the Tutor mailing list