itertools.intersect?

David M. Wilson dw at botanicus.net
Wed Jun 10 23:56:55 EDT 2009


On Jun 11, 12:59 am, Jack Diederich <jackd... at gmail.com> wrote:
> On Wed, Jun 10, 2009 at 6:24 PM, David Wilson<d... at botanicus.net> wrote:
> > During a fun coding session yesterday, I came across a problem that I
> > thought was already solved by itertools, but on investigation it seems
> > it isn't.
>
> > The problem is simple: given one or more ordered sequences, return
> > only the objects that appear in each sequence, without reading the
> > whole set into memory. This is basically an SQL many-many join.
>
> > I thought it could be accomplished through recursively embedded
> > generators, but that approach failed in the end. After posting the
> > question to Stack Overflow[0], Martin Geisler proposed a wonderfully
> > succinct and reusable solution (see below, or pretty printed at the
> > Stack Overflow URL).
>
> [snip]
>
> Here's my version;  keep a list of (curr_val, iterator) tuples and
> operate on those.
>
> def intersect(seqs):
>    iter_pairs = [(it.next(), it) for (it) in map(iter, seqs)]
>    while True:
>        min_val = min(iter_pairs)[0]
>        max_val = max(iter_pairs)[0]
>        if min_val == max_val:
>            yield min_val
>            max_val += 1 # everybody advances
>        for i, (val, it) in enumerate(iter_pairs):
>            if val < max_val:
>                iter_pairs[i] = (it.next(), it)
>    # end while True

This version is a lot easier to understand. The implicit StopIteration
is a double-edged sword for readability, but I like it. :)


David

>
> Interestingly you don't need to explicitly catch StopIteration and
> return because only the top level is a generator.  So both lines where
> it.next() are called will potentially end the loop.
> I also tried using a defaultdict(list) as the main structure; it
> worked but was uglier by far { curr_val => [it1, it2, ..]} with dels
> and appends.
>
> -Jack
>
> ps, woops, I forgot to hit reply all the first time.




More information about the Python-list mailing list