[Tutor] html color coding: where to start

Thu Sep 17 17:44:06 CEST 2009

Hi Tutors,
I want to color-code the different parts of the word in a morphologically
complex natural language. The file I have looks like this, where the fisrt
column is the word, and the  second is the composite part of speech tag. For
example, Al is a DETERMINER, wlAy is a NOUN and At is a PLURAL NOUN SUFFIX

Al+wlAy+At        DET+NOUN+NSUFF_FEM_PL
Al+mtHd+p        DET+ADJ+NSUFF_FEM_SG

The output I want is one on which the word has no plus signs, and each
segment is color-coded with a grammatical category. For example, the noun is
red, the det is green, and the suffix is orange.  Like on this page here:
http://docs.google.com/View?id=df7jv9p9_3582pt63cc4
I am stuck with the html part and I don't know where to start. I have no
experience with html, but I have this skeleton (which may not be the right
thing any way)
Any help with materials, modules, suggestions appreciated.

This skeleton of my program is as follows:

#############
RED = ("NOUN", "ADJ")
GREEN = ("DET", "DEMON")
ORANGE = ("NSUFF", "VSUFF", "ADJSUFF")
# print html head
def print_html_head():
    #print the head of the html page

def print_html_tail():
   # print the tail of the html page

def color(segment, color):
   # STUCK HERE shoudl take a color, and a segment for example

# main
import sys
infile = open(sys.argv[1]) # takes as input the POS-tagged file
print_html_head()
for line in infile:
    line = line.split()
    if len(line) != 2: continue
    word = line[0]
    pos = line[1]
    zipped = zip(word.split("+"), pos.split("+"))

    for x, y in zipped:
        if y in DET:
            color(x, "#FF0000")
        else:
            color(x, "#0000FF")

print_html_tail()

-- 
لا أعرف مظلوما تواطأ الناس علي هضمه ولا زهدوا في إنصافه كالحقيقة.....محمد
الغزالي
"No victim has ever been more repressed and alienated than the truth"

Emad Soliman Nawfal
Indiana University, Bloomington
--------------------------------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20090917/9303c243/attachment.htm>