Pythonic search of list of dictionaries
Skip Montanaro
skip at pobox.com
Tue Jan 4 13:14:06 EST 2005
Bulba> I put those dictionaries into the list:
Bulba> oldl=[x for x in orig] # where orig=csv.DictReader(ofile ...
Bulba> ..and then search for matching source terms in two loops:
Bulba> for o in oldl:
Bulba> for n in newl:
Bulba> if n['English'] == o['English']:
Bulba> ...
Bulba> Now, this works. However, not only this is very un-Pythonic, but
Bulba> also very inefficient: the complexity is O(n**2), so it scales up
Bulba> very badly.
How about using sets?
oenglish = set([item['English'] for item in oldl])
nenglish = set([item['English'] for item in newl])
matching = oenglish & nenglish
Once you have those that match, you can constrain your outer loop to just
those cases where
o['English'] in matching
If you're not using 2.4 yet, then get sets via:
from sets import Set as set
That's still not all that Pythonic, but should be a bit faster.
You might want to sort your lists by the 'English' key. I don't know how to
use the new key arg to list.sort(), but you can still do it the
old-fashioned way:
oldl.sort(lambda a,b: cmp(a['English'], b['English']))
newl.sort(lambda a,b: cmp(a['English'], b['English']))
Once sorted, you can then march through the lists in parallel, which should
give you an O(n) algorithm.
Skip
More information about the Python-list
mailing list