[Tutor] Re: Comparing lines in two files, writing result into a third
pan@uchicago.edu
pan@uchicago.edu
Wed Apr 23 17:09:02 2003
Hi Danny and Stuart,
IMO, Stuart's question can be solved without worrying about the
sort order. To me the following steps are more straightforward:
1. load files into 2 lists (a,b)
2. mix them together into list c
3. build a new list d from list c. When building, check
the count of elements in c. If count > 1, then attach *
4. remove duplicate items in d.
Put into code:
-------------------------------------------------
f1=open('c:/py/digest/f1.txt', 'r')
f2=open('c:/py/digest/f2.txt', 'r')
f3=open('c:/py/digest/f3.txt', 'w')
a= [x.strip() for x in f1.readlines()] # a= ['1','3','4', '6']
b= [x.strip() for x in f2.readlines()] # b= ['1','2','3','4','5']
c = [((a+b).count(x)>1) and (x+'*\n') or (x+'\n') for x in a+b]
set ={}
d= [set.setdefault(x,x) for x in c if x not in set] # Remove duplicates
d.sort()
f3.writelines(d)
f1.close()
f2.close()
f3.close()
-------------------------------------------------
It takes total only 13 lines of code. The lists a,b,c,d will be:
a= ['1', '3', '4', '6']
b= ['1', '2', '3', '4', '5']
c= ['1*\n', '3*\n', '4*\n', '6\n', '1*\n', '2\n', '3*\n', '4*\n', '5\n']
d= ['1*\n', '2\n', '3*\n', '4*\n', '5\n', '6\n']
# The 'duplicate removing code' is from :
# http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/52560
# by Raymond Hettinger, 2002/03/17
HTH
pan