merging intervals repeatedly

Magdoll magdoll at gmail.com
Wed Mar 12 17:41:42 CET 2008


> Hi,
>
> The problem, as stated here, may have several solutions. For instance
> the following set of intervals also satisfies the constraint:
> (1,15), (20,40), (50,100)
>
> One question you should ask yourself is: do you want all solutions? or
> just one?
> If you want just one, there's another question: which one? the one with
> the most intervals? any one?

I actually don't know which solution I want, and that's why I keep
trying different solutions :P

> If you want all of them, then I suggest using prolog rather than python
> (I hope I won't be flamed for advocating another language here).

Will I be able to switch between using prolog & python back and forth
though? Cuz the bulk of my code will still be written in python and
this is just a very small part of it.

>
> > Is there already some existing code in Python that I can easily take
> > advantage of to produce this? Right now I've written my own simple
> > solution, which is just to maintain a list of the intervals. I can use
> > the Interval module, but it doesn't really affect much. I read one
> > interval from the input file at a time, and use bisect to insert it in
> > order. The problem comes with merging, which sometimes can be
> > cascading.
>
> > ex:
> > read (51,65) ==> put (51,65) in list
> > read (62,100) ==> put (62,100) in list (overlap only be 4 <= N)
> > read (50,66) ==> merge with (51,65) to become (50,66) ==> now can
> > merge with (62,100)
>
> The way this algorithm is presented suggests an additional constraint:
> you cannot merge two intervals if their overlap <= N. In that case,
> there is a single solution indeed...
> Nitpick: you cannot merge (50,66) and (62,100) since their overlap is
> still <= 5.

Sorry I keep messing up my example. I meant, I would merge them if
they overlap by >= N....but anyways.

>
> If you have a reasonable number of intervals, you're algorithm seems
> fine. But it is O(n**2), so in the case you read a lot of intervals and
> you observe unsatisfying performances, you will have to store the
> intervals in a cleverer data structure, see one of these:http://en.wikipedia.org/wiki/Interval_treehttp://en.wikipedia.org/wiki/Segment_tree

Thanks! Both of these look interesting and potentially useful :)



More information about the Python-list mailing list