Simple Text Processing Help

patrick.waldo at patrick.waldo at
Tue Oct 16 14:47:51 CEST 2007

And now for something completely different...

I've been reading up a bit about Python and Excel and I quickly told
the program to output to Excel quite easily.  However, what if the
input file were a Word document?  I can't seem to find much
information about parsing Word files.  What could I add to make the
same program work for a Word file?

Again thanks a lot.

And the Excel Add on...

import codecs
import re
from win32com.client import Dispatch

path = "c:\\text_samples\\chem_1_utf8.txt"
path2 = "c:\\text_samples\\chem_2.txt"
input =, 'r','utf8')
output =, 'w', 'utf8')

NR_RE = re.compile(r'^\d+-\d+-\d+$')           #pattern for EINECS

tokens =
def iter_elements(tokens):
    product = []
    for tok in tokens:
        if NR_RE.match(tok) and len(product) >= 4:
            product[2:-1] = [' '.join(product[2:-1])]
            yield product
            product = []
    yield product

xlApp = Dispatch("Excel.Application")
xlApp.Visible = 1
c = 1

for element in iter_elements(tokens):
    xlApp.ActiveSheet.Cells(c,1).Value = element[0]
    xlApp.ActiveSheet.Cells(c,2).Value = element[1]
    xlApp.ActiveSheet.Cells(c,3).Value = element[2]
    xlApp.ActiveSheet.Cells(c,4).Value = element[3]
    c = c + 1

xlApp.Visible = 0
del xlApp


More information about the Python-list mailing list