[Tutor] Replacing fields in lines of various lengths
Dan Liang
danliang20 at gmail.com
Tue May 5 06:22:45 CEST 2009
(Please disregard my earlier message that was sent by mistake before I
finished composing. Sorry about that! :().
Hello Spir, Alan, and Paul, and tutors,
Thank you Spir, Alan, and Paul for your help with my previous code! Earlier,
I was asking how to separate a composite tag like the one in field 2 below
with sub-tags like those in the values of the dictionary below. In my
original question, I was asking about data formatted as follows:
w1 \t case_def_acc
w2 \t noun_prop
w3 \t case_def_gen
w4 \t dem_pron_f
And I put together the code below based on your suggestions, with minor
changes and it does work.
-------------Begin code----------------------------
#!usr/bin/python
tags = {
'noun-prop': 'noun_prop null null'.split(),
'case_def_gen': 'case_def gen null'.split(),
'dem_pron_f': 'dem_pron f null'.split(),
'case_def_acc': 'case_def acc null'.split(),
}
TAB = '\t'
def newlyTaggedWord(line):
line = line.rstrip() # I strip line ending
(word,tag) = line.split(TAB) # separate parts of line, keeping
data only
new_tags = tags[tag] # read in dict
tagging = TAB.join(new_tags) # join with TABs
return word + TAB + tagging # formatted result
def replaceTagging(source_name, target_name):
target_file = open(target_name, "w")
# replacement loop
for line in open(source_name, "r"):
new_line = newlyTaggedWord(line) + '\n'
target_file.write(new_line)
source_name.close()
target_file.close()
if __name__ == "__main__":
source_name = sys.argv[1]
target_name = sys.argv[2]
replaceTagging(source_name, target_name)
-------------End code----------------------------
Now since I have to workon different data format as follows:
-------------Begin data----------------------------
w1 \t case_def_acc \t yes
w2 \t noun_prop \t no
w3 \t case_def_gen \t
w4 \t dem_pron_f \t no
w3 \t case_def_gen \t
w4 \t dem_pron_f \t no
w1 \t case_def_acc \t yes
w3 \t case_def_gen \t
w3 \t case_def_gen \t
-------------End data----------------------------
Notices that some lines have nothing in yes-no filed, and hence end in a
tab.
My question is how to replace data in the filed of composite tags by
sub-tags like those in the dictionary values above and still be able to
print the whole line only with this change (i.e, composite tags replace by
sub-tags). Earlier, we read words and tags from line directly into the
dictionary since we were sure each line had 2 fields after separating by
tabs. Here, lines have various field lengths and sometimes have yes and no
finally, and sometimes not.
I tried to make changes to the code above by changing the function where we
read the dictionary, but it did not work. While it is ugly, I include it as
a proof that I have worked on the problem. I am sure you will have various
nice ideas.
-------------End code----------------------------
def newlyTaggedWord(line):
tagging = ""
line = line.split(TAB) # separate parts of line, keeping data only
if len(line)==3:
word = line[-3]
tag = line[-2]
new_tags = tags[tag]
decision = line[-1]
# in decision I wanted to store #either yes or no if one of #these existed
elif len(line)==2:
word = line[-2]
tag = line[-1]
decision = TAB
# I thought if it is a must to put sth in decision while decision #is really
absent in line, I would put a tab. But I really want to #avoid putting
anything there.
new_tags = tags[tag] # read in dict
tagging = TAB.join(new_tags) # join with TABs
return word + TAB + tagging + TAB + decision
-------------End code----------------------------
I appreciate your support!
--dan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20090505/f5397fb9/attachment.htm>
More information about the Tutor
mailing list