MS Word -- finding text

Mark Hammond mhammond at
Mon Jun 17 00:57:14 CEST 2002

Mike Brenner wrote:
 > The COM objects (like Project, Word, Excel, etc.) sometimes return
 > stuff in Unicode format. When they do, the python str() function dies
 > when converting non-ASCII unicode characters.
 > To avoid this problem, I use the following conversion routine. After
 > making the necessary check for None, it attempts a quick conversion
 > str() first. When necessary, it slowly goes through each character,
 > handling the exceptions that are raised.
 > The default is a prime because that is the most common character that
 > hits me in Word and Excel documents. Instead of coding it as an ASCII
 > single-quote characters, these applications code it as a more
 > "beautiful" character, so it kills the python str() function.

Note that you should be able to convert a Unicode object obtained from a 
COM object simply by saying:

s = u_ob.encode("mbcs")

And should never (OK OK - rarely :) fail.  If it _does_ fail, then it 
means that the string can not be described in the mbcs code page, and 
you will need to determine the appropriate code page youself.


More information about the Python-list mailing list