[Python-ideas] Possible new itertool: comm()
Ian Cordasco
graffatcolmingov at gmail.com
Tue Jan 6 17:33:16 CET 2015
On Tue, Jan 6, 2015 at 10:22 AM, Paul Moore <p.f.moore at gmail.com> wrote:
> In writing a utility script today, I found myself needing to do
> something similar to what the Unix "comm" utility does - take two
> sorted iterators, and partition the values into "only in the first",
> "only in the second", and "in both" groups.
>
> I couldn't find an obvious implementation around, so I ended up having
> to write my own (see below).
>
> Would this be something useful to add to the itertools module? The
> reasons I think it might be worth including in the stdlib are:
>
> 1. The implementation is somewhat tricky to get right, especially the
> edge cases (I'm not 100% sure my implementation is completely right in
> this regard). So implementing it yourself each time is error-prone and
> time consuming.
> 2. The times I've needed this have been ad-hoc scripts (I'm on
> Windows, so while a Unix user might use a quick shell pipeline with
> comm, that's less convenient for me) where depending on a 3rd party
> distribution from PyPI is less ideal.
>
> There's plenty of room for API bikeshedding here, but the basic
> algorithm remains the same.
>
> What do people think?
>
> Paul
>
> def comm(it1, it2):
> """Partition 2 sorted iterators into common, only in 1st, and only in 2nd.
>
> >>> list(comm([1,2,3], [2,3,4]))
> [('<', 1), ('=', 2), ('=', 3), ('>', 4)]
> >>> list(comm([0,1,2,3], [2,3,4,5]))
> [('<', 0), ('<', 1), ('=', 2), ('=', 3), ('>', 4), ('>', 5)]
> >>> list(comm([0,1,2,3,6], [2,3,4,5]))
> [('<', 0), ('<', 1), ('=', 2), ('=', 3), ('>', 4), ('>', 5), ('<', 6)]
> >>> list(comm([0,1], []))
> [('<', 0), ('<', 1)]
> >>> list(comm([], [0,1]))
> [('>', 0), ('>', 1)]
> >>> list(comm([0,1], [0,1]))
> [('=', 0), ('=', 1)]
> >>> list(comm([], []))
> []
> """
>
> it1 = iter(it1)
> it2 = iter(it2)
>
> it1_finished = False
> it2_finished = False
> try:
> v1 = next(it1)
> except StopIteration:
> it1_finished = True
> try:
> v2 = next(it2)
> except StopIteration:
> it2_finished = True
> while not (it1_finished or it2_finished):
> if v1 < v2:
> yield ('<', v1)
> try:
> v1 = next(it1)
> except StopIteration:
> it1_finished = True
> elif v1 > v2:
> yield ('>', v2)
> try:
> v2 = next(it2)
> except StopIteration:
> it2_finished = True
> else:
> yield ('=', v1)
> try:
> v1 = next(it1)
> except StopIteration:
> it1_finished = True
> try:
> v2 = next(it2)
> except StopIteration:
> it2_finished = True
>
> if it1_finished and not it2_finished:
> yield ('>', v2)
> for v2 in it2:
> yield ('>', v2)
> if it2_finished and not it1_finished:
> yield ('<', v1)
> for v1 in it1:
> yield ('<', v1)
>
> if __name__ == '__main__':
> import doctest
> doctest.testmod()
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
So my question is how well with this work with generators/iterators?
Your examples use lists, but it would be impossible to use this with
anything that isn't finite right?
More information about the Python-ideas
mailing list