[Tutor] Comparing lines in two files, writing result into a t hird file

stuart_clemons@us.ibm.com stuart_clemons@us.ibm.com
Thu Apr 24 07:52:02 2003




Hi Scott:

Thanks for laying out the dictionary structure for me.  I wanted to use
dictionaries a year ago for something I was working on, but I couldn't get
dictionaries to work for me (it was very frustrating), so I ended up
hacking something else together.   I think that was about the last time I
needed to use Python for anything.

Anyway, I'm going to try using this structure to solve the problem I'm
working on.  I need to produce this "merged" list fairly quickly (like
today) and then on a regular basis.

To Danny and Pan:  Thanks very much for contributing thoughts and code
related to this problem.  Since it looks like I'll have a need to use
Python for the forseeable future, as a learning exercise, I'm going to try
each of these approaches to this problem.   Most of the work I'll need
Python for is similar to this problem.  (Next up is formatting a dump of a
text log file into a readable report. I think I know how to handle this
one, but if not, as Arnold would say, I'll be back !)
Thanks again.

- Stuart



> Concerning dictionaries, do you think dictionaries is the structure
> to use ? If so, I'll try to spend some time reading up on
> dictionaries.  I do remember having problems reading a file into a
> dictionary when I tried it a year ago or so.

Since you're pressed for time, I can give you a basic script using a
dictionary....

#####
d = {} # Start with an empty dictionary

f1 = file('file1.txt', 'r')
for num in f1.readlines():
    num = num.strip()       # get rid of any nasty newlines
    d[num] = 1              # and populate
f1.close()

f2 = file('file2.txt', 'r')
for num in f2.readlines():
    num = num.strip()                # again with the newlines
    if d.has_key(num): d[num] += 1   # <- increment value, or
    else: d[num] = 1                 # <- create a new key
f2.close()

nums = d.keys()
nums.sort()
f3 = file('file3.txt', 'w')
for num in nums:
    f3.write(num)          # Here we put the
    if d[num] > 1:         # newlines back, either
        f3.write("*\n")    # <- with
    else:                  # or
        f3.write("\n")     # <- without
f3.close()                 # the asterisk
####

Should be fairly quick. And it's certainly easier to flash-parse with the
naked eye than a value-packed list comprehension.

HTH
Scott