[Tutor] re.compile and objects - what am doing wrong?

Jacob S. keridee at jayco.net
Mon Dec 27 20:39:25 CET 2004


Hi,

[Blah,blah,blah,text cut]

>
> #! /usr/bin/python
> import string, re
First, I like to use string methods instead of importing the string module,
so let's get rid of that.

> dict1={}
> dict2={}
> inputFile1 = open("rough.txt", "rb")
> inputFile2 = open("polished.txt", "rb")
> for row in inputFile1.readlines():

Next, in some recent python version, they have made files iterators,
so you don't have to call the method readlines() on the file--you can just
use      for row in inputFile1:

> words = string.split(row,"\t")

We change this to use string methods--  words = row.split("\t")

> dict1[words[0]]=words[1]
>
> for row in inputFile2.readlines():
> words = string.split(row,"\t")
> dict2[words[0]]=words[1]

Do all of the same stuff we just did to the above section.

> outFile1 = open ("rough.out", "w")
> outFile2 = open ("polish.out", "w")
> polishKeys=dict2.keys()
> roughKeys=dict1.keys()
> for key in polishKeys:
> searchPat=re.compile("%s") % key # Doesn't work

Now, what you seem to be using here is % formatting.
The first thing I see wrong -- you should put the % key inside the
parenthesis
with the string so it reads re.compile("%s" % key)

The next thing, I'm not so sure of because of the lost indentation,
but I think that you are looking through all of roughKeys for each polishKey
and seeing if a particular string is in the roughKeys and print it. Right?

Having said that, I suggest we get rid of the re module and again use string
methods.
So, we use this.

# I get rid of % formatting because key is already a string!
for searchPat in polishKeys:
    for  rKey in roughKeys:
        if searchPat in rKey:
            print searchPat+"----"+rKey

> outFile1.close()
> outFile2.close()
> inputFile1.close()
> inputFile2.close()

Having said all of that, I see one more thing--Why on earth use dictionaries
if all you're using is the list of keys?
Why not just use lists?
So, the script as I see it is:

 #! /usr/bin/python

roughkeys = []
polishkeys = []
inputFile1 = open("rough.txt", "rb")
inputFile2 = open("polished.txt", "rb")

for row in inputFile1:
    roughkeys.append(row.split("\t")[0])
for row in inputFile2:
    polishkeys.append(row.split("\t")[0])


outFile1 = open ("rough.out", "w")
outFile2 = open ("polish.out", "w")

for key in polishkeys:
    for rkey in roughkeys:
        if key in rkey:
            print key+"----"+rkey

outFile1.close()
outFile2.close()
inputFile1.close()
inputFile2.close()


But!!!! Even better. As I ramble on, I see that list comprhensions can make
things easier!

 #! /usr/bin/python

inputFile1 = open("rough.txt", "rb")
inputFile2 = open("polished.txt", "rb")

roughkeys = [row.split("\t")[0] for row in inputFile1]
polishkeys = [row.split("\t")[0] for row in inputFile2]

outFile1 = open ("rough.out", "w")
outFile2 = open ("polish.out", "w")

for key in polishkeys:
    for rkey in roughkeys:
        if key in rkey:
            print key+"----"+rkey

outFile1.close()
outFile2.close()
inputFile1.close()
inputFile2.close()

Does this work? Does it help? It was fun!

Jacob Schmidt



More information about the Tutor mailing list