[python-win32] UnicodeEncodingError when print a doc file

Wed Jun 15 03:02:06 CEST 2011

cool_go_blue wrote:
> Thanks. It works. Actually, what I want to do is to parse the whole
> document. How can I retrieve the list of words in the
> document? I use the following code:
>
> for word in doc.Content.Text.encode("cp1252", "replace"):
>     print word
>
> It seems that word is each a character.
>

No, what you are getting back is a Python string.  When you enumerate
through a string, you get characters.  This is basic Python.

If your words are all separated by spaces, you can use split:

    for word in doc.Content.Text.encode("cp1252","replace").split():
        print word

Note, however, that you don't need to convert it to an 8-bit character
set until you want to print it.  If you are going to process these
words, then you might as well leave them in Unicode.

-- 
Tim Roberts, timr at probo.com
Providenza & Boekelheide, Inc.