[Tutor] generating unique set of dicts from a list of dicts

bruce badouglas at gmail.com
Tue Jan 10 21:24:46 CET 2012


trying to figure out how to generate a unique set of dicts from a
json/list of dicts.

initial list :::
[{"pStart1a": {"termVal":"1122","termMenu":"CLASS_SRCH_WRK2_STRM","instVal":"OSUSI",
"instMenu":"CLASS_SRCH_WRK2_INSTITUTION","goBtn":"CLASS_SRCH_WRK2_SSR_PB_SRCH",
"pagechk":"CLASS_SRCH_WRK2_SSR_PB_SRCH","nPage":"CLASS_SRCH_WRK2_SSR_PB_CLASS_SRCH"},
"pSearch1a":
{"chk":"CLASS_SRCH_WRK2_MON","srchbtn":"DERIVED_CLSRCH_SSR_EXPAND_COLLAPS"}},
 {"pStart1":""},
 {"pStart1a":{"termVal":"1122","termMenu":"CLASS_SRCH_WRK2_STRM","instVal":"OSUSI",
 "instMenu":"CLASS_SRCH_WRK2_INSTITUTION","goBtn":"CLASS_SRCH_WRK2_SSR_PB_SRCH",
 "pagechk":"CLASS_SRCH_WRK2_SSR_PB_SRCH","nPage":"CLASS_SRCH_WRK2_SSR_PB_CLASS_SRCH"},
 "pSearch1a":
 {"chk":"CLASS_SRCH_WRK2_MON","srchbtn":"DERIVED_CLSRCH_SSR_EXPAND_COLLAPS"}},
 {"pStart1":""}]



As an exmple, the following is the test list:

[{"pStart1a": {"termVal":"1122","termMenu":"CLASS_SRCH_WRK2_STRM","instVal":"OSUSI",
"instMenu":"CLASS_SRCH_WRK2_INSTITUTION","goBtn":"CLASS_SRCH_WRK2_SSR_PB_SRCH",
"pagechk":"CLASS_SRCH_WRK2_SSR_PB_SRCH","nPage":"CLASS_SRCH_WRK2_SSR_PB_CLASS_SRCH"},
"pSearch1a":
{"chk":"CLASS_SRCH_WRK2_MON","srchbtn":"DERIVED_CLSRCH_SSR_EXPAND_COLLAPS"}},
 {"pStart1":""},
 {"pStart1a":{"termVal":"1122","termMenu":"CLASS_SRCH_WRK2_STRM","instVal":"OSUSI",
 "instMenu":"CLASS_SRCH_WRK2_INSTITUTION","goBtn":"CLASS_SRCH_WRK2_SSR_PB_SRCH",
 "pagechk":"CLASS_SRCH_WRK2_SSR_PB_SRCH","nPage":"CLASS_SRCH_WRK2_SSR_PB_CLASS_SRCH"},
 "pSearch1a":
 {"chk":"CLASS_SRCH_WRK2_MON","srchbtn":"DERIVED_CLSRCH_SSR_EXPAND_COLLAPS"}},
 {"pStart1":""}]

Trying to get the following, list of unique dicts, so there aren't
duplicate dicts.
 Searched various sites/SO.. and still have a mental block.

[
  {"pStart1a":
  {"termVal":"1122","termMenu":"CLASS_SRCH_WRK2_STRM","instVal":"OSUSI",
   "instMenu":"CLASS_SRCH_WRK2_INSTITUTION","goBtn":"CLASS_SRCH_WRK2_SSR_PB_SRCH",
   pagechk":"CLASS_SRCH_WRK2_SSR_PB_SRCH","nPage":"CLASS_SRCH_WRK2_SSR_PB_CLASS_SRCH"},
  "pSearch1a":
  {"chk":"CLASS_SRCH_WRK2_MON","srchbtn":"DERIVED_CLSRCH_SSR_EXPAND_COLLAPS"}},
  {"pStart1":""}]

I was considering iterating through the initial list, copying each
dict into a new list, and doing a basic comparison, adding the next
dict if it's not in the new list.. is there another/better way?

posted this to StackOverflow as well.  >>>>
http://stackoverflow.com/questions/8808286/simplifying-a-json-list-to-the-unique-dict-items
 <<<

There was a potential soln that I couldn't understand.

	
-------------------------
The simplest approach -- using list(set(your_list_of_dicts)) won't
work because Python dictionaries are mutable and not hashable (that
is, they don't implement __hash__). This is because Python can't
guarantee that the hash of a dictionary won't change after you insert
it into a set or dict.

However, in your case, since you (don't seem to be) modifying the data
at all, you can compute your own hash, and use this along with a
dictionary to relatively easily find the unique JSON objects without
having to do a full recursive comparison of each dictionary to the
others.

First, we need a function to compute a hash of the dictionary. Rather
than trying to build our own hash function, let's use one of the
built-in ones from hashlib:

def dict_hash(d):
    out = hashlib.md5()
    for key, value in d.iteritems():
        out.update(unicode(key))
        out.update(unicode(value))
    return out.hexdigest()

(Note that this relies on unicode(...) for each of your values
returning something unique -- if you have custom classes in the
dictionaries whose __unicode__ returns something like "MyClass
instance", this will fail or will require modification. Also, in your
example, your dictionaries are flat, but I'll leave it as an exercise
to the reader how to expand this solution to work with dictionaries
that contain other dicts or lists.)

Since dict_hash returns a string, which is immutable, you can now use
a dictionary to find the unique elements:

uniques_map = {}
for d in list_of_dicts:
    uniques[dict_hash(d)] = d
unique_dicts = uniques_map.values()

>>>>*** not sure what the "uniqes" is, or what/how it should be defined....


thoughts/comments are welcome

thanks


More information about the Tutor mailing list