[Tutor] Replacing fields in lines of various lengths
Alan Gauld
alan.gauld at btinternet.com
Tue May 5 11:43:22 CEST 2009
"Dan Liang" <danliang20 at gmail.com> wrote
> And I put together the code below based on your suggestions, with minor
> changes and it does work.
Good, now your question is?
-------------Begin code----------------------------
#!usr/bin/python
tags = {
'noun-prop': 'noun_prop null null'.split(),
'case_def_gen': 'case_def gen null'.split(),
'dem_pron_f': 'dem_pron f null'.split(),
'case_def_acc': 'case_def acc null'.split(),
}
TAB = '\t'
def newlyTaggedWord(line):
line = line.rstrip() # I strip line ending
(word,tag) = line.split(TAB) # separate parts of line, keeping
data only
new_tags = tags[tag] # read in dict
tagging = TAB.join(new_tags) # join with TABs
return word + TAB + tagging # formatted result
def replaceTagging(source_name, target_name):
target_file = open(target_name, "w")
# replacement loop
for line in open(source_name, "r"):
new_line = newlyTaggedWord(line) + '\n'
target_file.write(new_line)
source_name.close()
target_file.close()
AG> These two lines should be inside the function, after the loop.
if __name__ == "__main__":
source_name = sys.argv[1]
target_name = sys.argv[2]
replaceTagging(source_name, target_name)
-------------End code----------------------------
Now since I have to workon different data format as follows:
-------------Begin data----------------------------
w1 \t case_def_acc \t yes
w2 \t noun_prop \t no
w3 \t case_def_gen \t
w4 \t dem_pron_f \t no
w3 \t case_def_gen \t
w4 \t dem_pron_f \t no
w1 \t case_def_acc \t yes
w3 \t case_def_gen \t
w3 \t case_def_gen \t
-------------End data----------------------------
Notices that some lines have nothing in yes-no filed, and hence end in a
tab.
My question is how to replace data in the filed of composite tags by
sub-tags like those in the dictionary values above and still be able to
print the whole line only with this change (i.e, composite tags replace by
sub-tags). Earlier, we read words and tags from line directly into the
dictionary since we were sure each line had 2 fields after separating by
tabs. Here, lines have various field lengths and sometimes have yes and no
finally, and sometimes not.
I tried to make changes to the code above by changing the function where
we
read the dictionary, but it did not work. While it is ugly, I include it as
a proof that I have worked on the problem. I am sure you will have various
nice ideas.
-------------End code----------------------------
def newlyTaggedWord(line):
tagging = ""
line = line.split(TAB) # separate parts of line, keeping data
only
if len(line)==3:
word = line[-3]
tag = line[-2]
new_tags = tags[tag]
decision = line[-1]
# in decision I wanted to store #either yes or no if one of #these existed
elif len(line)==2:
word = line[-2]
tag = line[-1]
decision = TAB
# I thought if it is a must to put sth in decision while decision #is
really
absent in line, I would put a tab. But I really want to #avoid putting
anything there.
new_tags = tags[tag] # read in dict
tagging = TAB.join(new_tags) # join with TABs
return word + TAB + tagging + TAB + decision
-------------End code----------------------------
I appreciate your support!
--dan
--------------------------------------------------------------------------------
> _______________________________________________
> Tutor maillist - Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor
>
More information about the Tutor
mailing list