[Python-ideas] Possible new itertool: comm()

Paul Moore p.f.moore at gmail.com
Wed Jan 7 16:32:45 CET 2015


On 7 January 2015 at 14:19, Victor Stinner <victor.stinner at gmail.com> wrote:
> I never used the UNIX comm tool. I didn't know that it exists :-)
>
> Can you maybe explain the purpose the tool/your function? Show an example.

The function has doctests included which may help.

Basically, take 2 sorted lists (the Unix tool takes 2 sorted files of
text), and merge them, preserving order, with each line marked as
"only occurs in the first input", "only occurs in the second input"
and "occurs in both inputs". So, for example, with inputs 1,1,2,4 and
1,3,4 we would get

both: 1
first: 1
first: 2
second: 3
both: 4

The problem with difflib/diff is mainly that it does a *lot* more work
than is needed - for sorted input, a fast single-pass algorithm is
fine, whereas a general diff needs to work to find common
subsequences.

Paul


More information about the Python-ideas mailing list