Intersection of lists/sets -- with a catch

Carl Banks invalidemail at
Wed Oct 19 00:10:43 CEST 2005

James Stroud wrote:
> Hello All,
> I find myself in this situation from time to time: I want to compare two lists
> of arbitrary objects and (1) find those unique to the first list, (2) find
> those unique to the second list, (3) find those that overlap. But here is the
> catch: comparison is not straight-forward. For example, I will want to
> compare 2 objects based on a set of common attributes. These two objects need
> not be members of the same class, etc. A function might help to illustrate:
> def test_elements(element1, element2):
>   """
>   Returns bool.
>   """
>   # any evaluation can follow
>   return (element1.att_a == element2.att_a) and \
>          (element1.att_b == element2.att_b)


> Its probably obvious to everyone that this type of task seems perfect for
> sets. However, it does not seem that sets can be used in the following way,
> using a hypothetical "comparator" function. The "comparator" would be
> analagous to a function passed to the list.sort() method. Such a device would
> crush the previous code to the following very straight-forward statements:
> some_set = Set(some_list, comparator=test_elements)
> another_set = Set(another_list, comparator=test_elements)
> overlaps = some_set.intersection(another_set)
> unique_some = some_set.difference(another_set)
> unique_another = another_set.difference(some_set)
> I am under the personal opinion that such a modification to the set type would
> make it vastly more flexible, if it does not already have this ability.
> Any thoughts on how I might accomplish either technique or any thoughts on how
> to make my code more straightforward would be greatly appreciated.

Howabout something like this (untested):

class CmpProxy(object):
    def __init__(self,obj):
        self.obj = obj
    def __eq__(self,other):
        return (self.obj.att_a == other.obj.att_b
                and self.obj.att_b == other.obj.att_b)
    def __hash__(self):
        return hash((self.obj.att_a,self.obj.att_b))

set_a = set(CmpProxy(x) for x in list_a)
set_b = set(CmpProxy(y) for y in list_b)
overlaps = [ z.obj for z in set_a.intersection(set.b) ]

Carl Banks

More information about the Python-list mailing list