[Python-ideas] Possible new itertool: comm()
Steven D'Aprano
steve at pearwood.info
Tue Jan 6 19:00:05 CET 2015
On Tue, Jan 06, 2015 at 04:22:23PM +0000, Paul Moore wrote:
> In writing a utility script today, I found myself needing to do
> something similar to what the Unix "comm" utility does - take two
> sorted iterators, and partition the values into "only in the first",
> "only in the second", and "in both" groups.
It seems awfully specialised to me, so I'm not sure that it belongs in
itertools. But for what it's worth, here's my implementation which I
think is easier to understand than yours, and covers the unfortunate
case where you have unorderable items such as float NANs in your data.
def comm(values1, values2):
"""Partition 2 sorted iterators into common, only in 1st, and only in 2nd.
>>> list(comm([1,2,3], [2,3,4]))
[('<', 1), ('=', 2), ('=', 3), ('>', 4)]
>>> list(comm([0,1,2,3], [2,3,4,5]))
[('<', 0), ('<', 1), ('=', 2), ('=', 3), ('>', 4), ('>', 5)]
>>> list(comm([0,1,2,3,6], [2,3,4,5]))
[('<', 0), ('<', 1), ('=', 2), ('=', 3), ('>', 4), ('>', 5), ('<', 6)]
>>> list(comm([0,1], []))
[('<', 0), ('<', 1)]
>>> list(comm([], [0,1]))
[('>', 0), ('>', 1)]
>>> list(comm([0,1], [0,1]))
[('=', 0), ('=', 1)]
>>> list(comm([], []))
[]
>>> list(comm([1,2,3,4], [2,3,float('nan'),4]))
[('<', 1), ('=', 2), ('=', 3), ('?', (4, nan)), ('>', 4)]
"""
MISSING = object()
it1 = iter(values1)
it2 = iter(values2)
a = next(it1, MISSING)
b = next(it2, MISSING)
while True:
if a is MISSING:
if b is MISSING:
return
yield ('>', b)
for x in it2:
yield ('>', x)
return
if b is MISSING:
assert a is not MISSING
yield ('<', a)
for x in it1:
yield ('<', x)
return
if a == b:
yield ('=', a) # Or could use b.
a = next(it1, MISSING)
b = next(it2, MISSING)
elif a < b:
yield ('<', a)
a = next(it1, MISSING)
elif a > b:
yield ('>', b)
b = next(it2, MISSING)
else:
# Unorderable a and b, like float NANs.
# Maybe we should raise instead?
yield ('?', (a, b))
a = next(it1, MISSING)
b = next(it2, MISSING)
If you prefer this version, feel free to just use it. If you need a
licence, consider it being under the MIT licence.
(Over the next day or so, time permitting, I'll try to put this on
ActiveState's Python recipes and make the licence more official.)
--
Steve
More information about the Python-ideas
mailing list