MS Word -- finding text

Prema prema at prema.co.nz
Sun Jun 16 22:42:00 EDT 2002


Hi Mike !
Excellent -- thanks very much for your time and sharing.
Would have taken a while to work through that -- I can't wait to try it out!
I'll post to this thread as to how I got on
Kind regards 
Thanks again
Mike




Mike Brenner <mikeb at mitre.org> wrote in message news:<mailman.1024247060.12127.python-list at python.org>...
> The COM objects (like Project, Word, Excel, etc.) sometimes return stuff in Unicode format. When they do, the python str() function dies when converting non-ASCII unicode characters. 
> 
> To avoid this problem, I use the following conversion routine. After making the necessary check for None, it attempts a quick conversion str() first. When necessary, it slowly goes through each character, handling the exceptions that are raised. 
> 
> The default is a prime because that is the most common character that hits me in Word and Excel documents. Instead of coding it as an ASCII single-quote characters, these applications code it as a more "beautiful" character, so it kills the python str() function.
> 
> You may or may not wish to change the return to eliminate the string.strip there, depending on your needs.
> 
> You could make a separate function that has just the TRY and the EXCEPT in it, in order to use the MAP function instead of the for loop.
> 
> Mike Brenner
> 
> 
> def phrase_unicode2string(message):
>     """
>     phrase_unicode2string works around the built-in function str(message)
>     which aborts when non-ASCII unicode characters are given to it.
>     """
>     if type(message)==types.NoneType:
>        return ""
>     try: st=str(message)
>     except: # untranslatable unicode character
>        list=[]
>        for uc in message:
>           try:
>              c=str(uc)
>           except:
>              c="`"
>           list.append(c)
>        # Note: because it raises exception instead of returning
>        # a default characters, we cannot use map() here.
>        st=string.join(list,"")
>     return string.strip(st)
> 
> ------------------------
> 
> Mike Prema wrote: 
> 
> #######
> from win32com.client import Dispatch
> W=Dispatch('Word.Application')
> D=W.Documents.Open('c:\\windows\\Desktop\\TOR.doc') ## Test Doc
> FindRange=D.Content
> F=FindRange.Find.Execute('Conman','True')
> print FindRange.Text
> #######
> str() doesn't seem to work in this case
> I tried using the codecs library but I think I am missing something



More information about the Python-list mailing list