efficient list merging

Bengt Richter bokr at oz.net
Tue Sep 3 18:32:30 EDT 2002


On Tue, 3 Sep 2002 13:27:53 -0700, "Ken Seehof" <kseehof at neuralintegrator.com> wrote:

>> [Peter Saffrey]
>>
>> > I have two lists of lists of strings which I would like to merge
>> > efficiently without repeats.  [...] Or is there a better way?
>>
>> I am not sure about the relative speed of everything, but I would be
>> tempted to try:
>>
>>    dict(zip(list1 + list2, [None] * (len(list1) + len(list2)))).keys()
>>
>> It seems to work:
>>
>> >>> list1 =3D list('abcdebxyz')
>> >>> list2 =3D list('pqqr')
>> >>> list1
>> ['a', 'b', 'c', 'd', 'e', 'b', 'x', 'y', 'z']
>> >>> list2
>> ['p', 'q', 'q', 'r']
>> >>> dict(zip(list1 + list2, [None] * (len(list1) + len(list2)))).keys()
>> ['a', 'c', 'b', 'e', 'd', 'q', 'p', 'r', 'y', 'x', 'z']
>>
>> --
>> Fran=E7ois Pinard   http://www.iro.umontreal.ca/~pinard
>
>This is a bit faster, but assumes that the first list does not initially
>contain duplicates.  It eliminates duplicates that would be introduced by
>merging items from list2.  Also, this maintains the original list order
>(i.e. list1+list2 without adding duplicates).
>
>def merge(list1,list2):
>    d1 =3D dict(zip(list1,list1))
If you don't care about a throwaway list of None's (and why should you,
if you don't care about a throwaway list of tuples ;-), you could also write
instead of the above line:
     d1={}; map(d1.setdefault, list1)
>    return list1+[s for s in list2 if s not in d1]
>

I think you all missed "lists of lists" of strings in the original problem statement,
just as I did. But adding my 2 cents' worth of solution to the wrong problem, I'd want
to avoid generating a lot of temp lists in case the imput was really large. E.g.,

 >>> list1 = list('abcdebxyz')
 >>> list2 = list('pqqr')
 >>> d={}
 >>> d=reduce(lambda d,item: d.setdefault(item) or d,list1,d)
 >>> d=reduce(lambda d,item: d.setdefault(item) or d,list2,d)
 >>> d.keys()
 ['a', 'c', 'b', 'e', 'd', 'q', 'p', 'r', 'y', 'x', 'z']

Or, for a one-liner,

 >>> dict(map(None,list1+list2,[None])).keys()
 ['a', 'c', 'b', 'e', 'd', 'q', 'p', 'r', 'y', 'x', 'z']

Regards,
Bengt Richter



More information about the Python-list mailing list