[Tutor] Search MS-Word

Pijus Virketis virketis@post.harvard.edu
Tue, 18 Jun 2002 14:28:21 +0300


<HTML><HEAD>
<BASEFONT FACE=3D"Arial" SIZE=3D"2" COLOR=3D"#000000">
</HEAD>
<BODY>
<div>Adolfo, <br></div>
<div><br>
<FONT COLOR=3D"#000080">&gt;I need to search for strings in=
 hundreds of .doc and</FONT><br>
<FONT COLOR=3D"#000080">&gt;.txt documents. I am in W98 with Python=
 2.1.1 -</FONT><br>
<br></div>
<div>Lets take the two separately. <br></div>
<div>&nbsp;</div>
<div>Text files are relatively easy in principle. Depending on=
 how sophisticated a search you need to conduct, either the=
 standard Python string search capacities or regular expressions=
 will do the trick handily. Here is a very simple example using a=
 file &quot;trivial.txt&quot; containing &quot;My name is=
 Pijus.&quot;<br></div>
<div>&nbsp;</div>
<div>&gt;&gt;&gt; source =3D open(&quot;c:\\trivial.txt&quot;,=
 &quot;r&quot;) #open the file for reading<br></div>
<div>&gt;&gt;&gt; text =3D source.readlines() #put all text into a=
 list<br></div>
<div>&gt;&gt;&gt; import string<br></div>
<div>&gt;&gt;&gt; for line in text:<br></div>
<div>&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;&nbsp;if=
 string.find(line, &quot;Pijus&quot;) !=3D -1: #if=
 &quot;Pijus&quot; found anywhere<br></div>
<div>&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;=
 &nbsp;&nbsp;&nbsp; &nbsp;&nbsp;print line<br></div>
<div>&nbsp;</div>
<div>This will return the string &quot;My name is Pijus&quot;.=
 See the string module documentation for more information on=
 find().<br></div>
<div>&nbsp;</div>
<div>Word files are a bit more tricky, since Python cannot simply=
 open one up like it can a straight text file. If you need your=
 code to work only for Windows, use COM to access M$ Word=
 directly. You will need to install the win32all extensions=
 (check out the python.org Windows section to get them). Then,=
 something along the following lines will happen:<br></div>
<div>&nbsp;</div>
<div>&gt;&gt;&gt; import win32com.client<br></div>
<div>&gt;&gt;&gt; wrdobj =3D=
 win32com.client.Dispatch(&quot;Word.Application&quot;)=
 <br></div>
<div>&gt;&gt;&gt; wrdobj.Visible =3D 1<br></div>
<div>&nbsp;</div>
<div>You should be looking at a nice Word session. This is where=
 Python sort of ends and the manipulation of the Word COM objects=
 begin. Off the top of my head, I can only remember how to open a=
 file:<br></div>
<div>&nbsp;</div>
<div>&gt;&gt;&gt;=
 wrdobj.Documents.Open(&quot;some_path&quot;)<br></div>
<div>&nbsp;</div>
<div>Check out the Word Object Browser and perhaps VBA=
 documentation to see what you need to do next in order to search=
 the file you just opened. It should take just one more line of=
 code to do something similar to the text example. <br></div>
<div>&nbsp;</div>
<div>Cheers, <br></div>
<div>&nbsp;</div>
<div>Pijus<br></div>
<div>&nbsp;</div>
<div>-- <br></div>
<div>&quot;Anyone attempting to generate random numbers by=
 deterministic means is, of course, living in a state of=
 sin.&quot; -- John Von Neumann<br></div>
</body></html>