Merging ordered lists

Raymond Hettinger python at rcn.com
Tue Jun 3 02:08:06 EDT 2008


> Thanks for the tip; itertools never ceases to amaze. One issue:
> groupby doesn't seem to remove all duplicates, just consecutive ones
> (for lists of strings and integers, at least):
>
> >>> [k for k, g in itertools.groupby(list("asdfdfffdf"))]
>
> ['a', 's', 'd', 'f', 'd', 'f', 'd', 'f']

That's why the example ran imerge() first.  That takes several sorted
inputs and gives a single sorted output.  Then, groupby() eliminates
the dups from the sorted input where the equal valued entries have
been brought together.


> Another issue: dropping everything into a heap and pulling it back out
> looks like it loses the original ordering, which isn't necessarily
> alphabetical, but "however the user wants to organize the
> spreadsheet". That's why I originally avoided using
> sorted(set(itertools.chain(*sources))). Do you see another way around
> this?

If your inputs were already sorted, then the above algorithms maintain
that order and eliminate duplicates.

If the inputs were not sorted, then I don't think you have a precise
idea of what it means to merge them while preserving order.   For
example if the inputs are XYZPDQ and bYlmPz, then what does a merged
sequence look like once the Y and P duplicates are removed? Is it
XZPDQblmz or some intermix of the two like XbZlPmDzQ?  If it is the
first of those, then the answer is simple:

seen = set()
for elem in chain(*sources):
    if elem in seen:
        continue
    yield elem
    seen.add(elem)

Raymond








More information about the Python-list mailing list