intersection of 2 list of pairs

John Machin sjmachin at lexicon.net
Mon Feb 21 17:46:28 EST 2005


Pierre Quentel wrote:
> Another method is to build two sets of sets, one for E1 and one for
E2,
> then make the intersection of these sets
>
> - with Python 2.3
>
>  >>> E1=[('a','g'),('r','s')]
>  >>> E2=[('g','a'),('r','q'),('f','h')]
>  >>> from sets import Set,ImmutableSet
>  >>> f=Set([ImmutableSet(s) for s in E1])& Set([ImmutableSet(s) for s
in
> E2])
>  >>> [tuple(x) for x in f]
> [('a', 'g')]
>
> - with Python 2.4
>
>  >>> E1=[('a','g'),('r','s')]
>  >>> E2=[('g','a'),('r','q'),('f','h')]
>  >>> f=set([frozenset(s) for s in E1]) & set([frozenset(s) for s in
E2])
>  >>> [tuple(x) for x in f]
> [('a', 'g')]
>
> You must use ImmutableSet or frozenset to be able to create a set of
sets
>
> Pierre

The problem with using frozenset([x,y)) -- as compared to
(min(x,y),max(x,y))-- is that you don't get an *understandable*
canonical representation of the pair:

>>> frozenset(('a','b'))
frozenset(['a', 'b'])
>>> frozenset(('b','c'))
frozenset(['c', 'b'])

i.e. it depends on the Python hash function for the type/class being
used. Try explaining that to the first person you meet at the bus stop.
Exceptions prove the rule: if the OP regularly caught the same bus as
the timbot he wouldn't be asking questions like this :-)

In any case the OP still has a problem; when ('a','g') -- or ('g', 'a')
--pops out as the answer, he still doesn't know which list(s) the
duplicates are in, nor which way they are represented. To do something
useful, the output would have to be a list of lists, with the inner
list representing a cluster of duplicates. Each duplicate would be
represented by a unique identifier -- such as a tuple: (list_number,
index_in_that_list) -- so that it can be retrieved, inspected,
jettisoned, ...




More information about the Python-list mailing list