Most efficient solution?

Christopher A. Craig com-nospam at
Tue Jul 17 15:49:54 CEST 2001

Hash: SHA1

** For this post presume that "a" is len(A) and "b" is len(B)

"Jeffery D. Collins" <jcollins at> writes:

> How about this:
> import operator
> h = {}
> map(operator.setitem, [h]*len(B), B, B)  # create a dictionary of B
> bb = filter(h.get, A)  # find all elements of A in B (include repeats)
> map(A.remove, bb)

This is very elegant, but incorrect.  Since you are setting h[B]=B you
have no guarantee that h.get(i) is true if h[i] exists.  

import operator
h = {}
map(operator.setitem, [h]*len(B), B, [None]*len(B))
bb = filter(h.get, A)
map(A.remove, bb)

Additionally, A.remove() is O(a) so this ends up
taking O(a*b), which is bad.  Also we can avoid having to create the
list [None]*len(B) by using has_key() instead of get().
You could fix this with:

import operator
h = {}
map(operator.setitem, [h]*len(B), B, B)
A= [ x for x in A if not h.has_key(x) ]

Which is pretty good, but possibly overly clever.  My tests (Python
2.1 on an Ultra 10 with Solaris 8) show that

h = {}
for i in B: h[i]=None
A=[ x for x in A if not h.has_key(x)]

outperforms this (very) slightly (probably because the creation of
[h]*len(B) is more expensive than the explicit for loop).

- -- 
Christopher A. Craig <com-nospam at>
"It's the one thing I understand, I guess." Bill Gates, about BASIC
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: Processed by Mailcrypt


More information about the Python-list mailing list