Analysing Word documents (slow) What's wrong with this code please!

jmdeschamps jmdeschamps at cvm.qc.ca
Mon Jan 19 21:11:16 CET 2004


Eric Brunel <eric.brunel at N0SP4M.com> wrote in message news:<bugpf4$i7$1 at news-reader4.wanadoo.fr>...
> jmdeschamps wrote:
> > Anyone has a hint how else to get faster results?
> > (This is to find out what was bold in the document, in order to grab
> > documents ptoduced in word and generate html (web pages) and xml
> > (straight data) versions)
> > 
> > # START ========================
> > import win32com.client
> > import tkFileDialog, time
> > 
> > # Launch Word
> > MSWord = win32com.client.Dispatch("Word.Application")
> > 
> > myWordDoc = tkFileDialog.askopenfilename()
> > 
> > MSWord.Documents.Open(myWordDoc)
> > 
> > boldRanges=[]  #list of bold ranges
> > boldStart = -1
> > boldEnd = -1
> > t1= time.clock()
> > for i in range(len(MSWord.Documents[0].Content.Text)):
> >     if MSWord.Documents[0].Range(i,i+1).Bold  : # testing for bold
> > property
> 
> Vaguely knowing how pythoncom works, you'd really better avoid asking for 
> MSWord.Documents[0] at each loop step: pythoncom will fetch the COM objects 
> corresponding to all attributes and methods you ask for dynamically and it may 
> cost a lot of time. So doing:
> 
> doc = MSWord.Documents[0]
> for i in range(len(doc.Content.text)):
>    if doc.Range(i,i+1).Bold: ...
> 
> may greatly improve performances.
> 
> >    
...
Thanks, it does! And using builtin Find object also.

Jean-Marc



More information about the Python-list mailing list