MS Word -- finding text

Mark Hammond mhammond at
Mon Jun 17 00:57:14 CEST 2002

Mike Brenner wrote:
 > The COM objects (like Project, Word, Excel, etc.) sometimes return
 > stuff in Unicode format. When they do, the python str() function dies
 > when converting non-ASCII unicode characters.
 > To avoid this problem, I use the following conversion routine. After
 > making the necessary check for None, it attempts a quick conversion
 > str() first. When necessary, it slowly goes through each character,
 > handling the exceptions that are raised.
 > The default is a prime because that is the most common character that
 > hits me in Word and Excel documents. Instead of coding it as an ASCII
 > single-quote characters, these applications code it as a more
 > "beautiful" character, so it kills the python str() function.

Note that you should be able to convert a Unicode object obtained from a 
COM object simply by saying:

s = u_ob.encode("mbcs")

And should never (OK OK - rarely :) fail.  If it _does_ fail, then it 
means that the string can not be described in the mbcs code page, and 
you will need to determine the appropriate code page youself.


