Doing set operation on non-hashable objects

5lvqbwl02 at sneakemail.com 5lvqbwl02 at sneakemail.com
Fri Dec 26 20:28:10 EST 2008


On Dec 24, 12:52 pm, Carl Banks <pavlovevide... at gmail.com> wrote:
> On Dec 24, 1:16 pm, 5lvqbw... at sneakemail.com wrote:
>
>
>
>
>
> > Hi,
>
> > I'm writing an application which is structured roughly as follows:
>
> > "db" is a dict, where the values are also dicts.
> > A function searches through db and returns a list of values, each of
> > which is a dict as described above.
> > I need to perform set operations on these lists (intersection and
> > union)
> > However the objects themselves are not hashable, and therefore can't
> > be in a set, because they are dicts.
> > I'm not sure how large each set will be, but the point is I'm trying
> > to avoid anything looking like an O(n^2) algorithm, so I can't just
> > use naive double-looping to check for intersection/union on a pair of
> > lists.
>
> > The only way I can think of to do this right is to hash the dicts by
> > freezing them, turning them all into tuples, which can then be
> > hashed.  But this is a problem because more dicts might be nested
> > inside.  At any rate, I'd need to thaw them out when I'm done and turn
> > them back into dicts, which seems complicated and annoying, and all
> > this leads me to believe there is a better way.
>
> > What I really need is a set of pointers, so at the end of the
> > operation I have the actual objects pointed to.  Can I somehow use the
> > object IDs as set elements, then recreate a list with the actual
> > objects they point to?
>
> > How do you get the object back from an ID in python?
>
> Quick and dirty way is to reference the dicts with a lightweight
> hashable object that uses the dict's identity.  For instance:
>
> class Identity(object):
>     def __init__(self,obj):
>         self.obj = obj
>     def __hash__(self):
>         return hash(id(self.obj))
>     def __eq__(self,other):
>         return self.obj is other.obj
>
> Then, instead of "s.add(d)" use "s.add(Identity(d))".
>
> Instead of "d = s.pop()" use "d = s.pop().obj".
>
> Instead of "d in s" use "Identity(d) in s".
>
> And so on.
>
> To do it a bit better, you could create an IdentitySet class that
> subclasses set and wraps the methods so that they automatically apply
> wrap and unwrap the arguments on their way in and out (I'd bet there's
> already a cookbook recipe to do that).
>
> Carl Banks

Thank you, I like this idea a lot.  It's close to what I've been
thinking but a bit cleaner.

Michael



More information about the Python-list mailing list