Strategy for determing difference between 2 very large dictionaries

Peter Otten __peter__ at web.de
Wed Dec 24 09:46:51 CET 2008


Gabriel Genellina wrote:

> En Wed, 24 Dec 2008 05:16:36 -0200, <python at bdurham.com> escribió:

[I didn't see the original post]

>> I'm looking for suggestions on the best ('Pythonic') way to
>> determine the difference between 2 very large dictionaries
>> containing simple key/value pairs.
>> By difference, I mean a list of keys that are present in the
>> first dictionary, but not the second. And vice versa. And a list
>> of keys in common between the 2 dictionaries whose values are
>> different.
>> The 2 strategies I'm considering are:
>> 1. Brute force: Iterate through first dictionary's keys and
>> determine which keys it has that are missing from the second
>> dictionary. If keys match, then verify that the 2 dictionaries
>> have identical values for the same key. Repeat this process for
>> the second dictionary.
>> 2. Use sets: Create sets from each dictionary's list of keys and
>> use Python's set methods to generate a list of keys present in
>> one dictionary but not the other (for both dictionaries) as well
>> as a set of keys the 2 dictionaries have in common.
> 
> I cannot think of any advantage of the first approach - so I'd use sets.
> 
> k1 = set(dict1.iterkeys())
> k2 = set(dict2.iterkeys())
> k1 - k2 # keys in dict1 not in dict2
> k2 - k1 # keys in dict2 not in dict1
> k1 & k2 # keys in both
> 
>> Using the set
>> of keys in common, compare values across dictionaries to
>> determine which keys have different values (can this last step be
>> done via a simple list comprehension?)

If you are not interested in the intermediate results and the dictionary
values are hashable you can get the difference by

>>> a = dict(a=1, b=2, c=3)
>>> b = dict(b=2, c=30, d=4)
>>> dict(set(a.iteritems()) ^ set(b.iteritems()))
{'a': 1, 'c': 3, 'd': 4}

Peter



More information about the Python-list mailing list