itertools.intersect?

Jack Diederich jackdied at gmail.com
Wed Jun 10 19:59:18 EDT 2009


On Wed, Jun 10, 2009 at 6:24 PM, David Wilson<dw at botanicus.net> wrote:
> During a fun coding session yesterday, I came across a problem that I
> thought was already solved by itertools, but on investigation it seems
> it isn't.
>
> The problem is simple: given one or more ordered sequences, return
> only the objects that appear in each sequence, without reading the
> whole set into memory. This is basically an SQL many-many join.
>
> I thought it could be accomplished through recursively embedded
> generators, but that approach failed in the end. After posting the
> question to Stack Overflow[0], Martin Geisler proposed a wonderfully
> succinct and reusable solution (see below, or pretty printed at the
> Stack Overflow URL).
>
[snip]

Here's my version;  keep a list of (curr_val, iterator) tuples and
operate on those.

def intersect(seqs):
   iter_pairs = [(it.next(), it) for (it) in map(iter, seqs)]
   while True:
       min_val = min(iter_pairs)[0]
       max_val = max(iter_pairs)[0]
       if min_val == max_val:
           yield min_val
           max_val += 1 # everybody advances
       for i, (val, it) in enumerate(iter_pairs):
           if val < max_val:
               iter_pairs[i] = (it.next(), it)
   # end while True

Interestingly you don't need to explicitly catch StopIteration and
return because only the top level is a generator.  So both lines where
it.next() are called will potentially end the loop.
I also tried using a defaultdict(list) as the main structure; it
worked but was uglier by far { curr_val => [it1, it2, ..]} with dels
and appends.

-Jack

ps, woops, I forgot to hit reply all the first time.



More information about the Python-list mailing list