[Python-ideas] Possible new itertool: comm()

Ian Cordasco graffatcolmingov at gmail.com
Tue Jan 6 17:33:16 CET 2015


On Tue, Jan 6, 2015 at 10:22 AM, Paul Moore <p.f.moore at gmail.com> wrote:
> In writing a utility script today, I found myself needing to do
> something similar to what the Unix "comm" utility does - take two
> sorted iterators, and partition the values into "only in the first",
> "only in the second", and "in both" groups.
>
> I couldn't find an obvious implementation around, so I ended up having
> to write my own (see below).
>
> Would this be something useful to add to the itertools module? The
> reasons I think it might be worth including in the stdlib are:
>
> 1. The implementation is somewhat tricky to get right, especially the
> edge cases (I'm not 100% sure my implementation is completely right in
> this regard). So implementing it yourself each time is error-prone and
> time consuming.
> 2. The times I've needed this have been ad-hoc scripts (I'm on
> Windows, so while a Unix user might use a quick shell pipeline with
> comm, that's less convenient for me) where depending on a 3rd party
> distribution from PyPI is less ideal.
>
> There's plenty of room for API bikeshedding here, but the basic
> algorithm remains the same.
>
> What do people think?
>
> Paul
>
> def comm(it1, it2):
>     """Partition 2 sorted iterators into common, only in 1st, and only in 2nd.
>
>     >>> list(comm([1,2,3], [2,3,4]))
>     [('<', 1), ('=', 2), ('=', 3), ('>', 4)]
>     >>> list(comm([0,1,2,3], [2,3,4,5]))
>     [('<', 0), ('<', 1), ('=', 2), ('=', 3), ('>', 4), ('>', 5)]
>     >>> list(comm([0,1,2,3,6], [2,3,4,5]))
>     [('<', 0), ('<', 1), ('=', 2), ('=', 3), ('>', 4), ('>', 5), ('<', 6)]
>     >>> list(comm([0,1], []))
>     [('<', 0), ('<', 1)]
>     >>> list(comm([], [0,1]))
>     [('>', 0), ('>', 1)]
>     >>> list(comm([0,1], [0,1]))
>     [('=', 0), ('=', 1)]
>     >>> list(comm([], []))
>     []
>     """
>
>     it1 = iter(it1)
>     it2 = iter(it2)
>
>     it1_finished = False
>     it2_finished = False
>     try:
>         v1 = next(it1)
>     except StopIteration:
>         it1_finished = True
>     try:
>         v2 = next(it2)
>     except StopIteration:
>         it2_finished = True
>     while not (it1_finished or it2_finished):
>         if v1 < v2:
>             yield ('<', v1)
>             try:
>                 v1 = next(it1)
>             except StopIteration:
>                 it1_finished = True
>         elif v1 > v2:
>             yield ('>', v2)
>             try:
>                 v2 = next(it2)
>             except StopIteration:
>                 it2_finished = True
>         else:
>             yield ('=', v1)
>             try:
>                 v1 = next(it1)
>             except StopIteration:
>                 it1_finished = True
>             try:
>                 v2 = next(it2)
>             except StopIteration:
>                 it2_finished = True
>
>     if it1_finished and not it2_finished:
>         yield ('>', v2)
>         for v2 in it2:
>             yield ('>', v2)
>     if it2_finished and not it1_finished:
>         yield ('<', v1)
>         for v1 in it1:
>             yield ('<', v1)
>
> if __name__ == '__main__':
>     import doctest
>     doctest.testmod()
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/

So my question is how well with this work with generators/iterators?
Your examples use lists, but it would be impossible to use this with
anything that isn't finite right?


More information about the Python-ideas mailing list