Merging multiple sorted sequences.
cs at zip.com.au
Wed Apr 12 18:52:13 EDT 2017
On 12Apr2017 23:15, Erik <python at lucidity.plus.com> wrote:
>I need to be able to lazily merge a variable number of already-sorted(*)
>variable-length sequences into a single sorted sequence. The merge should
>continue until the longest sequence has been exhausted.
>(*) They may in practice be a lazy source of data known to only ever
>be generated in an order that's "sorted" from the POV of this
>I have the following - can anyone think of any improvements (other
>than indentation style on the try/excepts ;) )? This seems like the
>sort of thing for which there might be a built-in or recipe that I've
It can be improved. The big cost in yours is notionally the sort, which you do
on every yield.
For comparison, here's mine:
It keeps a heap (see the "heapq" standard module) containing tuples of the
latest item from each source and the source iterable i.e. (item, iterable).
That way the lowest item is always first on the heap, so you pop it off, which
gets you (item, iterable). Yield the item. Fetch the next item from iterable.
If that gets StopIteration, that is exhausted. Otherwise insert the new item
and the iterable back into the heap.
Repeat until the heap is empty.
In this way you discard exhausted iterables and never need to re-examine them,
and the sorting overhead is just heap insertion which is O(log n) for the heap
size. Your method is O(n log n) for each item (sorting the collection of head
The heap is faster because it makes a much weaker ordering guarentee, and for
us the feature is that the front of the heap is the lowest item; the rest of
the heap is partially ordered, not fully ordered. Insert and pop preserve that
Cameron Simpson <cs at zip.com.au>
More information about the Python-list