[Python-ideas] Possible new itertool: comm()

Steven D'Aprano steve at pearwood.info
Tue Jan 6 19:00:05 CET 2015


On Tue, Jan 06, 2015 at 04:22:23PM +0000, Paul Moore wrote:
> In writing a utility script today, I found myself needing to do
> something similar to what the Unix "comm" utility does - take two
> sorted iterators, and partition the values into "only in the first",
> "only in the second", and "in both" groups.

It seems awfully specialised to me, so I'm not sure that it belongs in 
itertools. But for what it's worth, here's my implementation which I 
think is easier to understand than yours, and covers the unfortunate 
case where you have unorderable items such as float NANs in your data.

def comm(values1, values2):
    """Partition 2 sorted iterators into common, only in 1st, and only in 2nd.

    >>> list(comm([1,2,3], [2,3,4]))
    [('<', 1), ('=', 2), ('=', 3), ('>', 4)]
    >>> list(comm([0,1,2,3], [2,3,4,5]))
    [('<', 0), ('<', 1), ('=', 2), ('=', 3), ('>', 4), ('>', 5)]
    >>> list(comm([0,1,2,3,6], [2,3,4,5]))
    [('<', 0), ('<', 1), ('=', 2), ('=', 3), ('>', 4), ('>', 5), ('<', 6)]
    >>> list(comm([0,1], []))
    [('<', 0), ('<', 1)]
    >>> list(comm([], [0,1]))
    [('>', 0), ('>', 1)]
    >>> list(comm([0,1], [0,1]))
    [('=', 0), ('=', 1)]
    >>> list(comm([], []))
    []
    >>> list(comm([1,2,3,4], [2,3,float('nan'),4]))
    [('<', 1), ('=', 2), ('=', 3), ('?', (4, nan)), ('>', 4)]

    """
    MISSING = object()
    it1 = iter(values1)
    it2 = iter(values2)
    a = next(it1, MISSING)
    b = next(it2, MISSING)
    while True:
        if a is MISSING:
            if b is MISSING:
                return
            yield ('>', b)
            for x in it2:
                yield ('>', x)
            return
        if b is MISSING:
            assert a is not MISSING
            yield ('<', a)
            for x in it1:
                yield ('<', x)
            return
        if a == b:
            yield ('=', a)  # Or could use b.
            a = next(it1, MISSING)
            b = next(it2, MISSING)
        elif a < b:
            yield ('<', a)
            a = next(it1, MISSING)
        elif a > b:
            yield ('>', b)
            b = next(it2, MISSING)
        else:
            # Unorderable a and b, like float NANs.
            # Maybe we should raise instead?
            yield ('?', (a, b))
            a = next(it1, MISSING)
            b = next(it2, MISSING)


If you prefer this version, feel free to just use it. If you need a 
licence, consider it being under the MIT licence.

(Over the next day or so, time permitting, I'll try to put this on 
ActiveState's Python recipes and make the licence more official.)


-- 
Steve


More information about the Python-ideas mailing list