[Tutor] Comparing files, Counting Value

Danny Yoo dyoo at hkn.eecs.berkeley.edu
Tue Feb 1 01:21:53 CET 2005



Hi Michiyo,


Ok, let's take a look at the code.

> i=open("file 1") #value data
> o=open("file 2") #look-up file
> l=open("result", 'w')#result

We strongly recommend renaming these names to ones that aren't single
characters.

It's difficult to tell here what 'i', 'o', and 'l' mean, outside of the
context of the assignments.  The letters 'i' and 'o' often stand for the
words "input" and "output".  The way that you're using these as variable
names for input files will break the expectation of people who read the
code.  Futhermore, 'l' can easily be misread as 'i'.

In short, those names should be changed to something readable. This is a
pure style issue, but I think it's important as programmers to make the
code easy for humans to understand.



> results={}
>
> line=i.readline()
> line=o.readline()

This looks problematic.  The 'line' name here, by the end of these two
statements, is bound to the value of the look-up file's line.  I believe
you need to keep those values distinct.



> while line:
>      fields=line.split()
>      x1=fields[0, 1] in i    #x,y position in file 1
>      z=fields[2] in i           #value data in file 1
>      x2=fields[0, 1] in o   #x,y position in file 2


There are several fundamental issues with this.


Conventionally, a file-loop has the following structure:

###
for line in inputFile:
    # ... do something with that line
###

and anything that tries to iterate across a file in a different way should
be looked at with some care.  The way that your code is trying to iterate
across the file won't work.



We strongly recommend you read through a tutorial like:

    http://www.freenetpages.co.uk/hp/alan.gauld/tutfiles.htm

which has examples of how to write programs that work with files.



The way the program's structured also seems a bit monolithic.  I'd
recommend breaking down the problem into some phases:

    Phase 1: read the value-data file into some data structure.

    Phase 2: taking that data structure, read in the lookup file and
    identify which positions are matching, and record matches in the
    output file.

This partitioning of the problem should allow you to work on either phase
of the program without having to get it correct all at once.

The phases can be decoupled because we can easily feed in some kind of
hardcoded data structure into Phase 2, just to check that the lookup-file
matching is doing something reasonable.


For example:

###
hardcodedDataStructure = { (299, 189) : 8.543e-02,
                           (300, 189) : 0.000e+00,
                           (301, 189) : 0.000e+00,
                           (1, 188)   : 5.108e-02
                         }
###

is a very small subset of the information that your value data contains.

We can then take this hardcodedDataStructure and work on the second part
of the program using 'hardcodedDataStructure'.  Later on, when we do get
Phase 1 working ok, we can use the result of Phase 1 instead of the
hardcodedDataStructure, and it should just fit into place.

Does this make sense?  Don't try writing the program all at once and then
start trying to make it all work.  But instead, try building small simple
programs that do work, and then put those together.



The statements:

>      fields=line.split()
>      x1=fields[0, 1] in i    #x,y position in file 1

won't work.


'fields[0, 1]' does not represent the first and second elements of
'fields': it means something else, but you should get errors from it
anyway.  For example:

###
>>> values = ["hello", "world", "testing"]
>>> values[0, 1]
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
TypeError: list indices must be integers
###

So you should have already seen TypeErrors by this point of the program.


What you probably meant to write was:

###
fields = line.split()
x1 = (fields[0], fields[1])
###

Alternatively:

###
fields = line.split()
x1 = fields[0:1]
###

The significance of accidentely using the comma there won't make too much
sense until you learn about tuples and dictionaries, so I won't press on
this too much.



I'd recommend that you read through one of the Python tutorials before
trying to finishing the program.  Some of the things that your program
contains are... well, truthfully, a little wacky.  There are several good
tutorials linked here:

    http://www.python.org/moin/BeginnersGuide/NonProgrammers


If you have more questions, please feel free to ask!



More information about the Tutor mailing list