Overlapping region resolution
MRAB
google at mrabarnett.plus.com
Thu May 21 11:54:47 EDT 2009
psaffrey at googlemail.com wrote:
> This may be an algorithmic question, but I'm trying to code it in
> Python, so...
>
> I have a list of pairwise regions, each with an integer start and end
> and a float data point. There may be overlaps between the regions. I
> want to resolve this into an ordered list with no overlapping
> regions.
>
> My initial solution was to sort the list by the start point, and then
> compare each adjacent region, clipping any overlapping section in
> half. I've attached code at the bottom. Unfortunately, this does not
> work well if you have sections that have three or more overlapping
> regions.
>
> A more general solution is to work out where all the overlaps are
> before I start. Then I can break up the region space based on what
> regions overlap each section and take averages of all the datapoints
> that are present in a particular section. Devising an algorithm to do
> this is making my brain hurt. Any ideas?
>
> Peter
>
>
> # also validates the data
> def clipRanges(regions):
> for i in range(len(regions) - 1):
> thispoint = regions[i]
> nextpoint = regions[i+1]
> assert thispoint[1] > thispoint[0] and nextpoint[1] > nextpoint[0],
> "point read not valid"
> thisend = thispoint[1]
> nextstart = nextpoint[0]
> diff = thisend - nextstart
> # a difference of zero is too close together
> if diff > -1:
> if diff % 2 == 1:
> diff += 1
> correction = diff / 2
> newend = thisend - correction
> newstart = newend + 1
> assert newend > thispoint[0] and nextpoint[1] > newstart, "new
> range not valid!"
> newthispoint = (thispoint[0], newend, thispoint[2])
> newnextpoint = (newstart, nextpoint[1], nextpoint[2])
> regions[i] = newthispoint
> regions[i+1] = newnextpoint
> return regions
>
> regions = [ (0,10,2.5), (12,22,3.5), (15,25,1.2), (23, 30,0.01), (27,
> 37,1.23), (30, 35, 1.45) ]
> regions2 = [ (0,10,2.5), (1,11,1.1), (2,12,1.2) ]
>
> # works fine, produces [(0, 10, 2.5), (12, 18, 3.5), (19, 24, 1.2),
> (25, 28, 0.01), (29, 33, 1.23), (34, 35, 1.45)]
> print clipRanges(regions)
> # violates "new range not valid" assertion
> print clipRanges(regions2)
Is the upper bound of a range inclusive or exclusive? If it's exclusive
(like in Python) then it's empty only if lower == upper, and an overlap
occurs only if first.upper > second.lower (and you can have newstart ==
newend in your code).
More information about the Python-list
mailing list