[Tutor] Replacing fields in lines of various lengths

Dan Liang danliang20 at gmail.com
Tue May 5 06:22:45 CEST 2009


(Please disregard my earlier message that was sent by mistake before I
finished composing. Sorry about that! :().

Hello Spir, Alan, and Paul, and tutors,

Thank you Spir, Alan, and Paul for your help with my previous code! Earlier,
I was asking how to separate a composite tag like the one in field 2 below
with sub-tags like those in the values of the dictionary below. In my
original question, I was asking about data formatted as follows:

w1    \t   case_def_acc
w2‬    \t   noun_prop
‭w3‬    \t   case_def_gen
w4    \t   dem_pron_f


And I put together the code below based on your suggestions, with minor
changes and it does work.


-------------Begin code----------------------------

#!usr/bin/python
tags = {
'noun-prop': 'noun_prop null null'.split(),
'case_def_gen': 'case_def gen null'.split(),
'dem_pron_f': 'dem_pron f null'.split(),
'case_def_acc': 'case_def acc null'.split(),
}


TAB = '\t'


def newlyTaggedWord(line):
       line = line.rstrip()     # I strip line ending
       (word,tag) = line.split(TAB)    # separate parts of line, keeping
data only
       new_tags = tags[tag]          # read in dict
       tagging = TAB.join(new_tags)    # join with TABs
       return word + TAB + tagging   # formatted result

def replaceTagging(source_name, target_name):
       target_file = open(target_name, "w")
       # replacement loop
       for line in open(source_name, "r"):
           new_line = newlyTaggedWord(line) + '\n'
           target_file.write(new_line)

source_name.close()
target_file.close()

if __name__ == "__main__":
       source_name = sys.argv[1]
       target_name = sys.argv[2]
       replaceTagging(source_name, target_name)



-------------End code----------------------------


Now since I have to workon different data format as follows:

-------------Begin data----------------------------

w1    \t   case_def_acc   \t          yes
w2‬    \t   noun_prop   \t               no
‭w3‬    \t   case_def_gen   \t
w4    \t   dem_pron_f   \t             no
w3‬    \t   case_def_gen   \t
w4    \t   dem_pron_f   \t             no
w1    \t   case_def_acc   \t          yes
w3‬    \t   case_def_gen   \t
w3‬    \t   case_def_gen   \t

-------------End data----------------------------
Notices that some lines have nothing in yes-no filed, and hence end in a
tab.

My question is how to replace data in the filed of composite tags by
sub-tags like those in the dictionary values above and still be able to
print the whole line only with this change (i.e, composite tags replace by
sub-tags). Earlier, we read words and tags from line directly into the
dictionary since we were sure each line had 2 fields after separating by
tabs. Here, lines have various field lengths and sometimes have yes and no
finally, and sometimes not.

I tried to  make changes to the code above by changing the function where we
read the dictionary, but it did not work. While it is ugly, I include it as
a proof that I have worked on the problem. I am sure you will have various
nice ideas.


-------------End code----------------------------
def newlyTaggedWord(line):
       tagging = ""
       line = line.split(TAB)    # separate parts of line, keeping data only
       if len(line)==3:
           word = line[-3]
           tag = line[-2]
           new_tags = tags[tag]
           decision = line[-1]

# in decision I wanted to store #either yes or no if one of #these existed

       elif len(line)==2:
           word = line[-2]
           tag = line[-1]
           decision = TAB

# I thought if it is a must to put sth in decision while decision #is really
absent in line, I would put a tab. But I really want to #avoid putting
anything there.

           new_tags = tags[tag]          # read in dict
           tagging = TAB.join(new_tags)    # join with TABs
           return word + TAB + tagging + TAB + decision
-------------End code----------------------------


I appreciate your support!

--dan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20090505/f5397fb9/attachment.htm>


More information about the Tutor mailing list