[Python-ideas] Possible new itertool: comm()
Cameron Simpson
cs at zip.com.au
Tue Jan 6 22:09:59 CET 2015
On 06Jan2015 19:36, Antoine Pitrou <solipsis at pitrou.net> wrote:
>On Tue, 6 Jan 2015 18:22:44 +0000
>Paul Moore <p.f.moore at gmail.com> wrote:
>> On 6 January 2015 at 17:14, Raymond Hettinger
>> <raymond.hettinger at gmail.com> wrote:
>> >> On Jan 6, 2015, at 8:22 AM, Paul Moore <p.f.moore at gmail.com> wrote:
>> >>
>> >> In writing a utility script today, I found myself needing to do
>> >> something similar to what the Unix "comm" utility does - take two
>> >> sorted iterators, and partition the values into "only in the first",
>> >> "only in the second", and "in both" groups.
>> >
>> > As far as I can tell, this would be a very rare need.
>>
>> It's come up for me a few times, usually when trying to check two
>> lists of files to see which ones have been missed by a program, and
>> which ones the program thinks are present but no longer exist.
>
>Why don't you use sets for such things? Your iterator is really only
>useful for huge or unhashable inputs.
In my use case (an existing tool):
1) I'm merging log files of arbitrary size; I am _not_ going to suck them into
memory. A comm()-like function has a tiny and fixed memory footprint, versus an
unbounded out.
2) I want ordered output, and my inputs are already ordered; why on earth would
I impose a pointless sorting cost on my (currently linear) runtime?
Sets are the "obvious" Python way to do this, because comm() is more or less a
set intersection operation and sets are right there in Python. But for
unbounded sorted inputs and progressive output, they are a _bad_ choice.
Cheers,
Cameron Simpson <cs at zip.com.au>
Yesterday, I was running a CNC plasma cutter that's controlled by Windows XP.
This is a machine that moves around a plasma torch that cuts thick steel
plate. A "New Java update is available" window popped up while I was
working. Not good. - John Nagle
More information about the Python-ideas
mailing list