[Tutor] newB Q: extracting common elements from 2 lists

Jeff Shannon jeff@ccvcorp.com
Mon, 24 Jun 2002 10:35:33 -0700


Andrei Kulakov wrote:

> On Mon, Jun 24, 2002 at 06:51:28AM -0700, Stephen Doughty wrote:
>
> > I have two tuples where the values are lists,
> > eg :
> > >>> xmap
> > {1: [1, 3, 5, 7, 9]}
> > >>> ymap
> > {1: [1, 2, 3, 4, 5]}
> >
> > what I want is to extract the common elements of the
> > two tuple values and put them in a new tuple 'list'
>
> One way is to do this:
>
> >>> a = [1, 3, 5, 7, 9]
> >>> b = [1, 2, 3, 4, 5]
> >>> ab = [x for x in a if x in b]
> >>> ab
> [1, 3, 5]

This works great as long as your lists are short, but if the lists that
you're comparing are large (dozens to hundreds of elements each, or
more), then this can get *very* slow, because the 'in' operator needs to
look at each element of the list -- the way this is constructed, you end
up searching len(a) * len(b) elements.  That's fine for these examples
(the above results in 25 compares), but if each list is 20 elements
long, you need 400 compares, and if they're 100 elements each, then
that's 10,000 compares!

An alternate way of doing this, that does *not* suffer from that
logarithmic slowdown, is to take advantage of the properties of Python
dictionaries -- a dictionary lookup takes constant time, no matter how
many elements are in the dictionary.  So, first we can convert list a
into a dictionary --

>>> tempdict = {}
>>> for item in a:
...     tempdict[item] = 1
...
>>>

(We're only worried about the existence of items, so we can just use 1
as the value for each element in the dictionary.)  Now, we go through
list b, and each item that's in list b *and* the dictionary gets added
to list ab --

>>> ab = []
>>> for item in b:
...     if tempdict.haskey(item):
...         ab.append(item)
...
>>>

This method takes a bit more overhead than the approach that Andrei and
Alan showed, but it takes time that's proportional to the *sum* of the
list lengths, instead of the *product* of the list lengths -- for the
above example, with lists of length 5, you have 10 operations (though
each one is possibly slower than each of the 25 operations for the other
method).  However, with lists of length 20, you only need 40 operations,
and for 100-element lists, you only need 200 operations -- a far cry
from the 10,000 that the other method needed!

Of course, this is all an optimization, and as such, we shouldn't be
*too* eager to make use of it -- premature optimization is a great
pitfall, and you should only optimize when you *know* that a particular
segment of code is a bottleneck.  I'm not saying that you *should* use
this dictionary-method (if you expect your lists to stay in the 5-10
element size range, the other method is probably clearer and thus
better), I'm just showing you that there's other options that might work
better depending on your circumstances/requirements.

Jeff Shannon
Technician/Programmer
Credit International