Overlapping region resolution

MRAB google at mrabarnett.plus.com
Thu May 21 11:54:47 EDT 2009


psaffrey at googlemail.com wrote:
> This may be an algorithmic question, but I'm trying to code it in
> Python, so...
> 
> I have a list of pairwise regions, each with an integer start and end
> and a float data point. There may be overlaps between the regions. I
> want to resolve this into an ordered list with no overlapping
> regions.
> 
> My initial solution was to sort the list by the start point, and then
> compare each adjacent region, clipping any overlapping section in
> half. I've attached code at the bottom. Unfortunately, this does not
> work well if you have sections that have three or more overlapping
> regions.
> 
> A more general solution is to work out where all the overlaps are
> before I start. Then I can break up the region space based on what
> regions overlap each section and take averages of all the datapoints
> that are present in a particular section. Devising an algorithm to do
> this is making my brain hurt. Any ideas?
> 
> Peter
> 
> 
> # also validates the data
> def clipRanges(regions):
> 	for i in range(len(regions) - 1):
> 		thispoint = regions[i]
> 		nextpoint = regions[i+1]
> 		assert thispoint[1] > thispoint[0] and	nextpoint[1] > nextpoint[0],
> "point read not valid"
> 		thisend = thispoint[1]
> 		nextstart = nextpoint[0]
> 		diff = thisend - nextstart
> 		# a difference of zero is too close together
> 		if diff > -1:
> 			if diff % 2 == 1:
> 				diff += 1
> 			correction = diff / 2
> 			newend = thisend - correction
> 			newstart = newend + 1
> 			assert newend > thispoint[0] and nextpoint[1] > newstart, "new
> range not valid!"
> 			newthispoint = (thispoint[0], newend, thispoint[2])
> 			newnextpoint = (newstart, nextpoint[1], nextpoint[2])
> 			regions[i] = newthispoint
> 			regions[i+1] = newnextpoint
> 	return regions
> 
> regions = [ (0,10,2.5), (12,22,3.5), (15,25,1.2), (23, 30,0.01), (27,
> 37,1.23), (30, 35, 1.45) ]
> regions2 = [ (0,10,2.5), (1,11,1.1), (2,12,1.2) ]
> 
> # works fine, produces [(0, 10, 2.5), (12, 18, 3.5), (19, 24, 1.2),
> (25, 28, 0.01), (29, 33, 1.23), (34, 35, 1.45)]
> print clipRanges(regions)
> # violates "new range not valid" assertion
> print clipRanges(regions2)

Is the upper bound of a range inclusive or exclusive? If it's exclusive
(like in Python) then it's empty only if lower == upper, and an overlap
occurs only if first.upper > second.lower (and you can have newstart ==
newend in your code).



More information about the Python-list mailing list