[Tutor] Replacing fields in lines of various lengths

Tue May 5 11:43:22 CEST 2009

"Dan Liang" <danliang20 at gmail.com> wrote

> And I put together the code below based on your suggestions, with minor
> changes and it does work.

Good, now your question is?

-------------Begin code----------------------------

#!usr/bin/python
tags = {
'noun-prop': 'noun_prop null null'.split(),
'case_def_gen': 'case_def gen null'.split(),
'dem_pron_f': 'dem_pron f null'.split(),
'case_def_acc': 'case_def acc null'.split(),
}

TAB = '\t'

def newlyTaggedWord(line):
       line = line.rstrip()     # I strip line ending
       (word,tag) = line.split(TAB)    # separate parts of line, keeping
data only
       new_tags = tags[tag]          # read in dict
       tagging = TAB.join(new_tags)    # join with TABs
       return word + TAB + tagging   # formatted result

def replaceTagging(source_name, target_name):
       target_file = open(target_name, "w")
       # replacement loop
       for line in open(source_name, "r"):
           new_line = newlyTaggedWord(line) + '\n'
           target_file.write(new_line)

source_name.close()
target_file.close()

AG> These two lines should be inside the function, after the loop.

if __name__ == "__main__":
       source_name = sys.argv[1]
       target_name = sys.argv[2]
       replaceTagging(source_name, target_name)

-------------End code----------------------------

Now since I have to workon different data format as follows:

-------------Begin data----------------------------

w1    \t   case_def_acc   \t          yes
w2‬    \t   noun_prop   \t               no
‭w3‬    \t   case_def_gen   \t
w4    \t   dem_pron_f   \t             no
w3‬    \t   case_def_gen   \t
w4    \t   dem_pron_f   \t             no
w1    \t   case_def_acc   \t          yes
w3‬    \t   case_def_gen   \t
w3‬    \t   case_def_gen   \t

-------------End data----------------------------
Notices that some lines have nothing in yes-no filed, and hence end in a
tab.

My question is how to replace data in the filed of composite tags by
sub-tags like those in the dictionary values above and still be able to
print the whole line only with this change (i.e, composite tags replace by
sub-tags). Earlier, we read words and tags from line directly into the
dictionary since we were sure each line had 2 fields after separating by
tabs. Here, lines have various field lengths and sometimes have yes and no
finally, and sometimes not.

I tried to  make changes to the code above by changing the function where 
we
read the dictionary, but it did not work. While it is ugly, I include it as
a proof that I have worked on the problem. I am sure you will have various
nice ideas.

-------------End code----------------------------
def newlyTaggedWord(line):
       tagging = ""
       line = line.split(TAB)    # separate parts of line, keeping data 
only
       if len(line)==3:
           word = line[-3]
           tag = line[-2]
           new_tags = tags[tag]
           decision = line[-1]

# in decision I wanted to store #either yes or no if one of #these existed

       elif len(line)==2:
           word = line[-2]
           tag = line[-1]
           decision = TAB

# I thought if it is a must to put sth in decision while decision #is 
really
absent in line, I would put a tab. But I really want to #avoid putting
anything there.

           new_tags = tags[tag]          # read in dict
           tagging = TAB.join(new_tags)    # join with TABs
           return word + TAB + tagging + TAB + decision
-------------End code----------------------------

I appreciate your support!

--dan

--------------------------------------------------------------------------------

> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor
>