Editing MS Word

tonylabarbara at aol.com tonylabarbara at aol.com
Fri Oct 26 21:49:06 CEST 2007


I'm trying to edit MS Word tables with a python script. Here's a snippet:

import string

def msw2htmlTables():

input = "/usr/home/me/test.doc"

input = open(input,'r')

word = "whatever"

inputFlag = 0

splitString = []

for line in input:

# Check first the inputFlag, since we only want to delete the top

if inputFlag == 0:

splitString = line.split(word)


keep = splitString[1]


keep = "nada"

print len(splitString)

inputFlag = 1

elif inputFlag == 1:

# This means we've deleted the top junk. Let's search for the bottom junk.

splitString = line.split(word)


keep = splitString[0]

inputFlag = 2

print len(splitString)


keep += line

elif inputFlag == 2:

# This means everything else is junk.


Now, if var "word" is "orange", it will never pring the length of splitString. If it's "dark", it will. The only difference is the way they appear in the document. "orange" appears with a space character to the left and some MS garbage character to the right, while "dark" appears with a space character to the left and a comma to the right. Furthermore, if I use MSW junk characters as the definition of "word" (such as " Ù ", which is what I really need to search), it never even compiles (complains of an unpaired quote). It appears that python doesn't like MSW's junk characters. What shall I do?



Email and AIM finally together. You've gotta check out free AOL Mail! - http://mail.aol.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20071026/89bbf83d/attachment.html>

More information about the Python-list mailing list