[Python-ideas] Possible new itertool: comm()

Cameron Simpson cs at zip.com.au
Tue Jan 6 22:04:55 CET 2015


On 06Jan2015 09:14, Raymond Hettinger <raymond.hettinger at gmail.com> wrote:
>> On Jan 6, 2015, at 8:22 AM, Paul Moore <p.f.moore at gmail.com> wrote:
>> In writing a utility script today, I found myself needing to do
>> something similar to what the Unix "comm" utility does - take two
>> sorted iterators, and partition the values into "only in the first",
>> "only in the second", and "in both" groups.
>
>As far as I can tell, this would be a very rare need.

Really? I do this on an ad hoc basis in shell scripts a lot. I think it might 
just be rare for you.

In Python I would generally be choosing to use sets, but that is at least 
partially because sets are there and the stdlib doesn't have a comm. With sets, 
I inherently need finite sources, and I also don't get to yield results 
progressively from ordered iterators. Also, it needs to fit into memory.

The most obvious Python use case I happen to actually have to hand would almost 
be an abuse of the suggested comm(): a log merge tool I wrote for merging 
multiple logs; in that case the "common" set is always empty, hence the "abuse" 
idea; it isn't really abuse, just a corner use case.

It gets run many times every day.

Reviewing the code, I notice it starts with:

  from heapq import merge

It strikes me that it might be easier to write comm() as a wrapper to 
heapq.merge. Though that wouldn't handle Steven's "unorderable items" case.

Cheers,
Cameron Simpson <cs at zip.com.au>

If it sticks, force it.  If it breaks, it needed replacing anyway.


More information about the Python-ideas mailing list