Simple Text Processing Help
patrick.waldo at gmail.com
patrick.waldo at gmail.com
Tue Oct 16 08:45:01 EDT 2007
And now for something completely different...
I see a lot of COM stuff with Python for excel...and I quickly made
the same program output to excel. What if the input file were a Word
document? Where is there information about manipulating word
documents, or what could I add to make the same program work for word?
Again thanks a lot. I'll start hitting some books about this sort of
text manipulation.
The Excel add on:
import codecs
import re
from win32com.client import Dispatch
path = "c:\\text_samples\\chem_1_utf8.txt"
path2 = "c:\\text_samples\\chem_2.txt"
input = codecs.open(path, 'r','utf8')
output = codecs.open(path2, 'w', 'utf8')
NR_RE = re.compile(r'^\d+-\d+-\d+$') #pattern for EINECS
number
tokens = input.read().split()
def iter_elements(tokens):
product = []
for tok in tokens:
if NR_RE.match(tok) and len(product) >= 4:
product[2:-1] = [' '.join(product[2:-1])]
yield product
product = []
product.append(tok)
yield product
xlApp = Dispatch("Excel.Application")
xlApp.Visible = 1
xlApp.Workbooks.Add()
c = 1
for element in iter_elements(tokens):
xlApp.ActiveSheet.Cells(c,1).Value = element[0]
xlApp.ActiveSheet.Cells(c,2).Value = element[1]
xlApp.ActiveSheet.Cells(c,3).Value = element[2]
xlApp.ActiveSheet.Cells(c,4).Value = element[3]
c = c + 1
xlApp.ActiveWorkbook.Close(SaveChanges=1)
xlApp.Quit()
xlApp.Visible = 0
del xlApp
input.close()
output.close()
More information about the Python-list
mailing list