[Tutor] Re: txt to xml using dom

Tom Brownlee tompol@hotmail.com
Fri Jun 13 17:47:01 2003


hello all, ive posted b4 but it was mentioned my message was vague so.....

Im a tertiary student from new zealand doing python programming at the 
moment. we have an assignment to do that involves using dom (minidom). if 
possible can anybody help me.

i have to take a course outline in .txt format and pass it through a python 
program that outputs the .txt document into xml tags etc. that is to 
say...txt document in = xml out (in a .txt file)

The course outline has headings and some subheadings that must be 'tagged' 
and the text in between left as is in the output file between the tags.

i have written the program but it doesnt quite work. i have attached it 
below to see if you can make sense of why it doesnt work.

thankyou very much for your help.

the output i get is this:
<?xml version="1.0" ?>
<2003 Course Outline/>
...and thats it.

ive also included below the small course outline to be parsed through the 
program.

p.s.
if possible can i have the corrected code. this may seem demanding and lazy 
on my part but i have spent many hours on this problem and utterly 
frustrated that it doesnt work, and as a beginner im losing faith in python 
altogether :(



<start of code>

import re
from xml.dom.minidom import *

def main(arg):
    try:
        f = open(arg)
    except:
        print "cannot open file"

    newdocument = Document()
    rootElement = newdocument.createElement("2003 Course Outline")
    newdocument.appendChild(rootElement)

    tagSequence = re.compile("(^\d+)\t+")
    while 1:
        line = f.readline()
        if len(line) == 0:
            break

        s = line
        target = tagSequence.search(s)
        if target:
            s2 = re.search("\t", s)
            result = s[s2.span()[1]:]
            newElement = newdocument.createElement(result)
            rootElement.appendChild(newElement)

    x = newdocument.toxml()
    f=open('CourseOutlineInXml.txt', 'w')
    f.write(x)
    print x

if __name__ == '__main__':
    main("CourseOutline.txt")

<end of code>

<start course document>
1	COURSE STAFF MEMBERS

	(a)	Course Academic Staff Member
Rob Oliver - Room number S662.  Contact number 940 8556
Email:  oliverr@cpit.ac.nz


	(b)	Programme Leader

Trevor Nesbit, Room number N215.  Contact number 940 8703
Email:  nesbitt@cpit.ac.nz

(c)	Course Co-ordinator

Dr Mike Lance - Room number S661 Contact number 940 8318
Email: lancem@cpit.ac.nz

(d) 	Head of School (Acting)

		Janne Ross, Room number S176, Contact number 940 8537
Email:  rossj@cpit.ac.nz

2	MATERIALS

	NIL


3	CLASS HOURS AND TIMES

Day	Time	Room
Tuesday	10:00 - 12:00	X307
Thursday	10:00 - 12:00	L249


4	REFERENCE TO STUDENT HANDBOOKS

	Students should obtain a copy of the following

	Christchurch Polytechnic Student Handbook
	Faculty of Commerce Student Handbook
	Programme Handbook

Each of these contains information to students about a range of policies and 
procedures.

<end of course document>

_________________________________________________________________
Gaming galore at  http://xtramsn.co.nz/gaming !