itertools.intersect?
Jack Diederich
jackdied at gmail.com
Wed Jun 10 19:59:18 EDT 2009
On Wed, Jun 10, 2009 at 6:24 PM, David Wilson<dw at botanicus.net> wrote:
> During a fun coding session yesterday, I came across a problem that I
> thought was already solved by itertools, but on investigation it seems
> it isn't.
>
> The problem is simple: given one or more ordered sequences, return
> only the objects that appear in each sequence, without reading the
> whole set into memory. This is basically an SQL many-many join.
>
> I thought it could be accomplished through recursively embedded
> generators, but that approach failed in the end. After posting the
> question to Stack Overflow[0], Martin Geisler proposed a wonderfully
> succinct and reusable solution (see below, or pretty printed at the
> Stack Overflow URL).
>
[snip]
Here's my version; keep a list of (curr_val, iterator) tuples and
operate on those.
def intersect(seqs):
iter_pairs = [(it.next(), it) for (it) in map(iter, seqs)]
while True:
min_val = min(iter_pairs)[0]
max_val = max(iter_pairs)[0]
if min_val == max_val:
yield min_val
max_val += 1 # everybody advances
for i, (val, it) in enumerate(iter_pairs):
if val < max_val:
iter_pairs[i] = (it.next(), it)
# end while True
Interestingly you don't need to explicitly catch StopIteration and
return because only the top level is a generator. So both lines where
it.next() are called will potentially end the loop.
I also tried using a defaultdict(list) as the main structure; it
worked but was uglier by far { curr_val => [it1, it2, ..]} with dels
and appends.
-Jack
ps, woops, I forgot to hit reply all the first time.
More information about the Python-list
mailing list