[Python-ideas] Possible new itertool: comm()
Cameron Simpson
cs at zip.com.au
Tue Jan 6 22:04:55 CET 2015
On 06Jan2015 09:14, Raymond Hettinger <raymond.hettinger at gmail.com> wrote:
>> On Jan 6, 2015, at 8:22 AM, Paul Moore <p.f.moore at gmail.com> wrote:
>> In writing a utility script today, I found myself needing to do
>> something similar to what the Unix "comm" utility does - take two
>> sorted iterators, and partition the values into "only in the first",
>> "only in the second", and "in both" groups.
>
>As far as I can tell, this would be a very rare need.
Really? I do this on an ad hoc basis in shell scripts a lot. I think it might
just be rare for you.
In Python I would generally be choosing to use sets, but that is at least
partially because sets are there and the stdlib doesn't have a comm. With sets,
I inherently need finite sources, and I also don't get to yield results
progressively from ordered iterators. Also, it needs to fit into memory.
The most obvious Python use case I happen to actually have to hand would almost
be an abuse of the suggested comm(): a log merge tool I wrote for merging
multiple logs; in that case the "common" set is always empty, hence the "abuse"
idea; it isn't really abuse, just a corner use case.
It gets run many times every day.
Reviewing the code, I notice it starts with:
from heapq import merge
It strikes me that it might be easier to write comm() as a wrapper to
heapq.merge. Though that wouldn't handle Steven's "unorderable items" case.
Cheers,
Cameron Simpson <cs at zip.com.au>
If it sticks, force it. If it breaks, it needed replacing anyway.
More information about the Python-ideas
mailing list