[Tutor] Re: Comparing lines in two files, writing result into a third

pan@uchicago.edu pan@uchicago.edu
Wed Apr 23 17:09:02 2003


Hi Danny and Stuart,

IMO, Stuart's question can be solved without worrying about the
sort order. To me the following steps are more straightforward:

1. load files into 2 lists (a,b)
2. mix them together into list c 
3. build a new list d from list c. When building, check 
   the count of elements in c. If count > 1, then attach *
4. remove duplicate items in d.

Put into code:

-------------------------------------------------

f1=open('c:/py/digest/f1.txt', 'r')
f2=open('c:/py/digest/f2.txt', 'r')
f3=open('c:/py/digest/f3.txt', 'w')

a= [x.strip() for x in f1.readlines()]  # a= ['1','3','4', '6']
b= [x.strip() for x in f2.readlines()]  # b= ['1','2','3','4','5']
c = [((a+b).count(x)>1) and (x+'*\n') or (x+'\n') for x in a+b]

set ={}
d= [set.setdefault(x,x) for x in c if x not in set]  # Remove duplicates
d.sort()

f3.writelines(d)
f1.close()
f2.close()
f3.close()

-------------------------------------------------

It takes total only 13 lines of code. The lists a,b,c,d will be:

a= ['1', '3', '4', '6']
b= ['1', '2', '3', '4', '5']
c= ['1*\n', '3*\n', '4*\n', '6\n', '1*\n', '2\n', '3*\n', '4*\n', '5\n']
d= ['1*\n', '2\n', '3*\n', '4*\n', '5\n', '6\n']

# The 'duplicate removing code' is from :
# http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/52560
#   by Raymond Hettinger, 2002/03/17


HTH
pan