Merging lists has made my brain hurt.

Alex Martelli aleax at aleax.it
Sun Oct 6 15:05:17 EDT 2002


Thorsten Kampe wrote:
        ...
> Treating lists a priori as sets for reasons of speed is definitely not
> the way to go because you're "losing information". I think it's better

Of course, you have to know which information matters.

> to treat them as tuples (in a mathematical sense) whereby you can
> always delete 'duplicates' afterwards if you don't care about them.

If you don't care about duplicates you're much better off normalizing
first, and if you care about COUNT of duplicates but not ordering (i.e.,
a bag) you're generally still better off normalizing first.

> Example: What's the intersection between [2, 1, 2, 3, 2] and
> [0, 1, 2, 3, 2, 4]? An intersection (union, symmetric difference) of
> tuples can only be meaningful for the /corresponding/ items of the
> tuples/lists, so it's neither [2, 1, 3] nor [2, 1, 2, 3, 2] but
> [2, 1, 2, 3].

But why not [1, 2, 3, 2] instead?  I know of no algebras defining
operations called intersection &c that are not commutative.  What
you mean by "corresponding" there is rather mysterious to me,
but wouldn't it mean "same item in the SAME position"?


> I tried to write a reliable (though maybe slow) program that takes
> care of this issue (especially that intersection is _commutative_):

But it's NOT -- [1, 2, 3, 2] != [2, 1, 2, 3].  You're losing information,
specifically order information.  A correct definition should be:

def intersect(seq1, seq2):
    return [ x for x, y in zip(seq1, seq2) if x == y ]

If what you want is to treat sequences as BAGS, then they're better
represented by dicts again -- order is the only information present
in a list and not in a dict, and you don't seem to consider it anyway.
So to do extensive processing of seqs-as-bags have converter funcs:

def seqToBag(seq):
    bag = {}
    for item in seq: bag[item] = 1 + bag.get(item, 0)

def bagToSeq(bag):
    return [ x for key in bag for x in [key]*bag[key]]

and for example "intersection" of bags then becomes, e.g:

def intersectBags(bag1, bag2):
    result = {}
    for item in bag1:
        count = min(bag1[item], bag2.get(item, 0))
        if count>0: result[item] = count
    return result


Alex




More information about the Python-list mailing list