efficient list merging
Bengt Richter
bokr at oz.net
Tue Sep 3 18:32:30 EDT 2002
On Tue, 3 Sep 2002 13:27:53 -0700, "Ken Seehof" <kseehof at neuralintegrator.com> wrote:
>> [Peter Saffrey]
>>
>> > I have two lists of lists of strings which I would like to merge
>> > efficiently without repeats. [...] Or is there a better way?
>>
>> I am not sure about the relative speed of everything, but I would be
>> tempted to try:
>>
>> dict(zip(list1 + list2, [None] * (len(list1) + len(list2)))).keys()
>>
>> It seems to work:
>>
>> >>> list1 =3D list('abcdebxyz')
>> >>> list2 =3D list('pqqr')
>> >>> list1
>> ['a', 'b', 'c', 'd', 'e', 'b', 'x', 'y', 'z']
>> >>> list2
>> ['p', 'q', 'q', 'r']
>> >>> dict(zip(list1 + list2, [None] * (len(list1) + len(list2)))).keys()
>> ['a', 'c', 'b', 'e', 'd', 'q', 'p', 'r', 'y', 'x', 'z']
>>
>> --
>> Fran=E7ois Pinard http://www.iro.umontreal.ca/~pinard
>
>This is a bit faster, but assumes that the first list does not initially
>contain duplicates. It eliminates duplicates that would be introduced by
>merging items from list2. Also, this maintains the original list order
>(i.e. list1+list2 without adding duplicates).
>
>def merge(list1,list2):
> d1 =3D dict(zip(list1,list1))
If you don't care about a throwaway list of None's (and why should you,
if you don't care about a throwaway list of tuples ;-), you could also write
instead of the above line:
d1={}; map(d1.setdefault, list1)
> return list1+[s for s in list2 if s not in d1]
>
I think you all missed "lists of lists" of strings in the original problem statement,
just as I did. But adding my 2 cents' worth of solution to the wrong problem, I'd want
to avoid generating a lot of temp lists in case the imput was really large. E.g.,
>>> list1 = list('abcdebxyz')
>>> list2 = list('pqqr')
>>> d={}
>>> d=reduce(lambda d,item: d.setdefault(item) or d,list1,d)
>>> d=reduce(lambda d,item: d.setdefault(item) or d,list2,d)
>>> d.keys()
['a', 'c', 'b', 'e', 'd', 'q', 'p', 'r', 'y', 'x', 'z']
Or, for a one-liner,
>>> dict(map(None,list1+list2,[None])).keys()
['a', 'c', 'b', 'e', 'd', 'q', 'p', 'r', 'y', 'x', 'z']
Regards,
Bengt Richter
More information about the Python-list
mailing list