How to identify which numbers in a list are within each others' range
Arnaud Delobelle
arnodel at googlemail.com
Fri Feb 1 20:07:09 CET 2008
On Feb 1, 1:28 pm, Arnaud Delobelle <arno... at googlemail.com> wrote:
> Yours is O(n^2) and \Omega(n^2). I think mine is O(max(c, nlogn))
> (assuming constant time access for dictionaries, but 'inside' could be
> replaced with a list) where c is the number of overlaps. I'll try to
> post a proof later (if it's true!). So if we are in a situation where
> overlaps are rare, it is in practice O(nlogn).
I have slightly modified overlaps() in order to make the analysis
easier (I have made overlaps() consider all intervals open). Here is
the new version:
def open_interval(i, (a, b)):
return (a, 1, i), (b, 0, i)
def overlaps(lst, interval=open_interval):
bounds = chain(*starmap(interval, enumerate(lst))) # 1
inside = {} # 2
for x, _, i in sorted(bounds): # 3
if inside.pop(i, None) is None: # 4
for j, y in inside.iteritems(): yield i, j # 5
inside[i] = x # 6
'Detailed' analysis of running overlaps(lst) follows, where
* lst is of length n
* the total number of overlaps in m
(assuming dict.__getitem__ and dic.__setitem__ are done in constant
time [1])
1. is a simple loop over lst, takes An
2. is constant (K)
3. The sorted(...) function will take Bnlogn
3,4. This loop is iterated 2n times, with the if condition it will
take Cn
5. This loop is iterated once for each overlap exacly. So it will take
Dm
6. This statement is executed n times, will take En
Altogether the time taken is:
K + (A+B+E)n + Bnlogn + Dm
So if m is dominated by nlogn, the algorithm is O(nlogn). Otherwise
one cannot hope for better thant O(m)!
--
Arnaud
[1] Otherwise one could use a combination of an array and a doubly
linked list to achieve constant time access, deletion and updating of
the 'inside' structure. Anyway even if it was log(n) the overall
complexity would be the same!
More information about the Python-list
mailing list