[Tutor] Re: Comparing lines in two files, writing result into a third

stuart_clemons@us.ibm.com stuart_clemons@us.ibm.com
Wed Apr 23 17:39:43 2003

Hi Pan:

Thanks for the detailed response.  I hope to have time tomorrow morning to
try this out and to further analyze.  The steps are very clear, but I need
to spend more time fully understanding the code part, though it's
definitely not over my head & I think I mostly know what's going on.  You
make it look very easy. Thanks again.

- Stuart

                                               To:       tutor@python.org                                                              
                      04/23/03 05:08 PM        cc:       stuart_clemons@us.ibm.com                                                     
                                               Subject:  Re: Comparing lines in two files, writing result into a third                 

Hi Danny and Stuart,

IMO, Stuart's question can be solved without worrying about the
sort order. To me the following steps are more straightforward:

1. load files into 2 lists (a,b)
2. mix them together into list c
3. build a new list d from list c. When building, check
   the count of elements in c. If count > 1, then attach *
4. remove duplicate items in d.

Put into code:


f1=open('c:/py/digest/f1.txt', 'r')
f2=open('c:/py/digest/f2.txt', 'r')
f3=open('c:/py/digest/f3.txt', 'w')

a= [x.strip() for x in f1.readlines()]  # a= ['1','3','4', '6']
b= [x.strip() for x in f2.readlines()]  # b= ['1','2','3','4','5']
c = [((a+b).count(x)>1) and (x+'*\n') or (x+'\n') for x in a+b]

set ={}
d= [set.setdefault(x,x) for x in c if x not in set]  # Remove duplicates



It takes total only 13 lines of code. The lists a,b,c,d will be:

a= ['1', '3', '4', '6']
b= ['1', '2', '3', '4', '5']
c= ['1*\n', '3*\n', '4*\n', '6\n', '1*\n', '2\n', '3*\n', '4*\n', '5\n']
d= ['1*\n', '2\n', '3*\n', '4*\n', '5\n', '6\n']

# The 'duplicate removing code' is from :
# http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/52560
#   by Raymond Hettinger, 2002/03/17