Simple Text Processing Help
patrick.waldo at gmail.com
patrick.waldo at gmail.com
Tue Oct 16 08:47:51 EDT 2007
And now for something completely different...
I've been reading up a bit about Python and Excel and I quickly told
the program to output to Excel quite easily. However, what if the
input file were a Word document? I can't seem to find much
information about parsing Word files. What could I add to make the
same program work for a Word file?
Again thanks a lot.
And the Excel Add on...
import codecs
import re
from win32com.client import Dispatch
path = "c:\\text_samples\\chem_1_utf8.txt"
path2 = "c:\\text_samples\\chem_2.txt"
input = codecs.open(path, 'r','utf8')
output = codecs.open(path2, 'w', 'utf8')
NR_RE = re.compile(r'^\d+-\d+-\d+$') #pattern for EINECS
number
tokens = input.read().split()
def iter_elements(tokens):
product = []
for tok in tokens:
if NR_RE.match(tok) and len(product) >= 4:
product[2:-1] = [' '.join(product[2:-1])]
yield product
product = []
product.append(tok)
yield product
xlApp = Dispatch("Excel.Application")
xlApp.Visible = 1
xlApp.Workbooks.Add()
c = 1
for element in iter_elements(tokens):
xlApp.ActiveSheet.Cells(c,1).Value = element[0]
xlApp.ActiveSheet.Cells(c,2).Value = element[1]
xlApp.ActiveSheet.Cells(c,3).Value = element[2]
xlApp.ActiveSheet.Cells(c,4).Value = element[3]
c = c + 1
xlApp.ActiveWorkbook.Close(SaveChanges=1)
xlApp.Quit()
xlApp.Visible = 0
del xlApp
input.close()
output.close()
More information about the Python-list
mailing list