[Tutor] Search MS-Word
Pijus Virketis
virketis@post.harvard.edu
Tue, 18 Jun 2002 14:28:21 +0300
<HTML><HEAD>
<BASEFONT FACE=3D"Arial" SIZE=3D"2" COLOR=3D"#000000">
</HEAD>
<BODY>
<div>Adolfo, <br></div>
<div><br>
<FONT COLOR=3D"#000080">>I need to search for strings in=
hundreds of .doc and</FONT><br>
<FONT COLOR=3D"#000080">>.txt documents. I am in W98 with Python=
2.1.1 -</FONT><br>
<br></div>
<div>Lets take the two separately. <br></div>
<div> </div>
<div>Text files are relatively easy in principle. Depending on=
how sophisticated a search you need to conduct, either the=
standard Python string search capacities or regular expressions=
will do the trick handily. Here is a very simple example using a=
file "trivial.txt" containing "My name is=
Pijus."<br></div>
<div> </div>
<div>>>> source =3D open("c:\\trivial.txt",=
"r") #open the file for reading<br></div>
<div>>>> text =3D source.readlines() #put all text into a=
list<br></div>
<div>>>> import string<br></div>
<div>>>> for line in text:<br></div>
<div> if=
string.find(line, "Pijus") !=3D -1: #if=
"Pijus" found anywhere<br></div>
<div> =
print line<br></div>
<div> </div>
<div>This will return the string "My name is Pijus".=
See the string module documentation for more information on=
find().<br></div>
<div> </div>
<div>Word files are a bit more tricky, since Python cannot simply=
open one up like it can a straight text file. If you need your=
code to work only for Windows, use COM to access M$ Word=
directly. You will need to install the win32all extensions=
(check out the python.org Windows section to get them). Then,=
something along the following lines will happen:<br></div>
<div> </div>
<div>>>> import win32com.client<br></div>
<div>>>> wrdobj =3D=
win32com.client.Dispatch("Word.Application")=
<br></div>
<div>>>> wrdobj.Visible =3D 1<br></div>
<div> </div>
<div>You should be looking at a nice Word session. This is where=
Python sort of ends and the manipulation of the Word COM objects=
begin. Off the top of my head, I can only remember how to open a=
file:<br></div>
<div> </div>
<div>>>>=
wrdobj.Documents.Open("some_path")<br></div>
<div> </div>
<div>Check out the Word Object Browser and perhaps VBA=
documentation to see what you need to do next in order to search=
the file you just opened. It should take just one more line of=
code to do something similar to the text example. <br></div>
<div> </div>
<div>Cheers, <br></div>
<div> </div>
<div>Pijus<br></div>
<div> </div>
<div>-- <br></div>
<div>"Anyone attempting to generate random numbers by=
deterministic means is, of course, living in a state of=
sin." -- John Von Neumann<br></div>
</body></html>