Merging lists has made my brain hurt.

Thorsten Kampe thorsten at thorstenkampe.de
Tue Oct 8 05:52:01 EDT 2002


*  Thorsten Kampe
> * Alex Martelli
>> Thorsten Kampe wrote:
[massive snip]
>> have another auxiliary function:
>>     def binop(seq0, seq1, op):
>>         count0 = tocount(seq0)
>>         count1 = tocount(seq1)
>>         if len(count0) > len(count1):
>>             count1, count0 = count0, count1
>>         for item in count0:
>>             count0[item] = op(count0[item], count1.get(item, 0))
>> 
>> and implement your 'and' by returning binop(seq0, seq1, min) and
>> your 'or' by returning binop(seq0, seq1, max) -- I think this also
>> clarifies the parallelism between the two cases.
>
> It's not that easy. Union and symmetric difference take elements of
> both sequences (e. g. union([11, 22], [22, 33])) so you have to loop
> through both sequences!
>
> So for the union, I think my "seq0 + (seq1 - seq0)" is shorter (and
> probably faster).

Okay, I rewrote the "seq0 + (seq1 - seq0)" part:
#v+
elif boolean_modus == 'or':   # union
    union = {}
    for item in count0:
        union[item] = max(count0[item], count1.get(item, 0))
    for item in count1:
        union[item] = max(count0.get(item, 0), count1[item])
    return toseq(union)
#v-

Like I predicted, with "seq0 = seq1 = [1] * 1000000" it's about a 0.8
sec slower and with "seq0 = [1] * 1000000, seq1 = range(1000000)" it's
about 3.7 sec slower.

>>>     elif boolean_modus == 'xor':  # symmetric difference
>>>         return boolean(boolean(seq0, seq1, 'or'), boolean(seq0, seq1,
>>>         'and'), 'not')
>> 
>> This one could also be more speedily implemented with the already
>> presented binop and another little auxiliary function:
>> 
>>     def maxminusmin(onecount, anothercount):
>>         return max(onecount, anothercount) - min(onecount, anothercount)
>
> That's abs(onecount - anothercount), isn't it?
>
>> and your 'xor' is implementable as
>>     return binop(seq0, seq1, maxminusmin)
>
> Same as for union: I'd have to loop through both lists, but that's
> cheap compared to the triple todict -> to seq conversion I do at the
> moment.

I rewrote this, too:
#v+
elif boolean_modus == 'xor':  # symmetric difference
    symdiff = {}
    for item in count0:
        symdiff[item] = abs(count0[item] - count1.get(item, 0))
    for item in count1:
        symdiff[item] = abs(count0.get(item, 0) - count1[item])
    return toseq(symdiff)
#v-

With the same test as with union, it's three and two times as fast.

Thanks!


Thorsten



More information about the Python-list mailing list