[Python-ideas] Possible new itertool: comm()

Cameron Simpson cs at zip.com.au
Wed Jan 7 03:14:40 CET 2015


On 06Jan2015 22:24, Antoine Pitrou <solipsis at pitrou.net> wrote:
>On Wed, 7 Jan 2015 08:09:59 +1100
>Cameron Simpson <cs at zip.com.au> wrote:
>> On 06Jan2015 19:36, Antoine Pitrou <solipsis at pitrou.net> wrote:
>> >On Tue, 6 Jan 2015 18:22:44 +0000
>> >Paul Moore <p.f.moore at gmail.com> wrote:
>> >> On 6 January 2015 at 17:14, Raymond Hettinger
>> >> <raymond.hettinger at gmail.com> wrote:
>> >> >> On Jan 6, 2015, at 8:22 AM, Paul Moore <p.f.moore at gmail.com> wrote:
>> >> >>
>> >> >> In writing a utility script today, I found myself needing to do
>> >> >> something similar to what the Unix "comm" utility does - take two
>> >> >> sorted iterators, and partition the values into "only in the first",
>> >> >> "only in the second", and "in both" groups.
>> >> >
>> >> > As far as I can tell, this would be a very rare need.
>> >>
>> >> It's come up for me a few times, usually when trying to check two
>> >> lists of files to see which ones have been missed by a program, and
>> >> which ones the program thinks are present but no longer exist.
>> >
>> >Why don't you use sets for such things? Your iterator is really only
>> >useful for huge or unhashable inputs.
>>
>> In my use case (an existing tool):
>>
>> 1) I'm merging log files of arbitrary size; I am _not_ going to suck them into
>> memory. A comm()-like function has a tiny and fixed memory footprint, versus an
>> unbounded out.
>
>I don't understand what your use case has to do with comm().

I've got two!

I, like another poster, also very commonly compare two similar directory trees 
for commonality, and equivalent list comparisons.

>If you
>just want to merge sorted iterators you don't need all the complication
>this function has.

Indeed not, but they are very similar tasks: you're pulling in 2 or more 
streams of sorted inputs and classifying them. In my direct use case, all into 
the same output stream, in order. In comm(), into three streams (well, being an 
iterator output: one stream with three classifications).

Cheers,
Cameron Simpson <cs at zip.com.au>

The double cam chain setup on the 1980's DOHC CB750 was another one of
Honda's pointless engineering breakthroughs. You know the cycle (if you'll
pardon the pun :-), Wonderful New Feature is introduced with much fanfare,
WNF is fawned over by the press, WNF is copied by the other three Japanese
makers (this step is sometimes optional), and finally, WNF is quietly dropped
by Honda.
        - Blaine Gardner, <blgardne at sim.es.com>


More information about the Python-ideas mailing list