[Tutor] re.compile and objects - what am doing wrong?
Jacob S.
keridee at jayco.net
Mon Dec 27 20:39:25 CET 2004
Hi,
[Blah,blah,blah,text cut]
>
> #! /usr/bin/python
> import string, re
First, I like to use string methods instead of importing the string module,
so let's get rid of that.
> dict1={}
> dict2={}
> inputFile1 = open("rough.txt", "rb")
> inputFile2 = open("polished.txt", "rb")
> for row in inputFile1.readlines():
Next, in some recent python version, they have made files iterators,
so you don't have to call the method readlines() on the file--you can just
use for row in inputFile1:
> words = string.split(row,"\t")
We change this to use string methods-- words = row.split("\t")
> dict1[words[0]]=words[1]
>
> for row in inputFile2.readlines():
> words = string.split(row,"\t")
> dict2[words[0]]=words[1]
Do all of the same stuff we just did to the above section.
> outFile1 = open ("rough.out", "w")
> outFile2 = open ("polish.out", "w")
> polishKeys=dict2.keys()
> roughKeys=dict1.keys()
> for key in polishKeys:
> searchPat=re.compile("%s") % key # Doesn't work
Now, what you seem to be using here is % formatting.
The first thing I see wrong -- you should put the % key inside the
parenthesis
with the string so it reads re.compile("%s" % key)
The next thing, I'm not so sure of because of the lost indentation,
but I think that you are looking through all of roughKeys for each polishKey
and seeing if a particular string is in the roughKeys and print it. Right?
Having said that, I suggest we get rid of the re module and again use string
methods.
So, we use this.
# I get rid of % formatting because key is already a string!
for searchPat in polishKeys:
for rKey in roughKeys:
if searchPat in rKey:
print searchPat+"----"+rKey
> outFile1.close()
> outFile2.close()
> inputFile1.close()
> inputFile2.close()
Having said all of that, I see one more thing--Why on earth use dictionaries
if all you're using is the list of keys?
Why not just use lists?
So, the script as I see it is:
#! /usr/bin/python
roughkeys = []
polishkeys = []
inputFile1 = open("rough.txt", "rb")
inputFile2 = open("polished.txt", "rb")
for row in inputFile1:
roughkeys.append(row.split("\t")[0])
for row in inputFile2:
polishkeys.append(row.split("\t")[0])
outFile1 = open ("rough.out", "w")
outFile2 = open ("polish.out", "w")
for key in polishkeys:
for rkey in roughkeys:
if key in rkey:
print key+"----"+rkey
outFile1.close()
outFile2.close()
inputFile1.close()
inputFile2.close()
But!!!! Even better. As I ramble on, I see that list comprhensions can make
things easier!
#! /usr/bin/python
inputFile1 = open("rough.txt", "rb")
inputFile2 = open("polished.txt", "rb")
roughkeys = [row.split("\t")[0] for row in inputFile1]
polishkeys = [row.split("\t")[0] for row in inputFile2]
outFile1 = open ("rough.out", "w")
outFile2 = open ("polish.out", "w")
for key in polishkeys:
for rkey in roughkeys:
if key in rkey:
print key+"----"+rkey
outFile1.close()
outFile2.close()
inputFile1.close()
inputFile2.close()
Does this work? Does it help? It was fun!
Jacob Schmidt
More information about the Tutor
mailing list