[python-win32] UnicodeEncodingError when print a doc file

cool_go_blue cool_go_blue at yahoo.com
Wed Jun 15 02:49:25 CEST 2011


Thanks. It works. Actually, what I want to do is to parse the whole document. How can I retrieve the list of words in the
document? I use the following code:

for word in doc.Content.Text.encode("cp1252", "replace"):
    print word

It seems that word is each a character. How can I find API to process words in an open word document. Thanks.


--- On Tue, 6/14/11, Preston Landers <planders at gmail.com> wrote:

From: Preston Landers <planders at gmail.com>
Subject: Re: [python-win32] UnicodeEncodingError when print a doc file
To: "cool_go_blue" <cool_go_blue at yahoo.com>
Date: Tuesday, June 14, 2011, 12:37 PM

The document contains Unicode content that can't be rendered directly as the encoding cp1252 (Windows-1252) used by your console when you use the print statement.
You can always write the content to a file in UTF-8 or UTF-16 and then view the file in a program like notepad that can handle Unicode. I'm not sure if there's any way to get the Windows console to produce actual Unicode.


If you absolutely must print this in the console, you can always substitute out unknown characters.  
print doc.Content.Text.encode("cp1252", "replace")


Hope this helps,Preston
On Tue, Jun 14, 2011 at 11:30 AM, cool_go_blue <cool_go_blue at yahoo.com> wrote:


I try to read a word document as follows:




	  
	  app = win32com.client.Dispatch('Word.Application')
doc = app.Documents.Open('D:\myfile.doc')
print doc.Content.Text

I receive the following error:



raceback (most recent call last):
  File "D:\projects\Myself\MySVD\src\ReadWord.py", line 11, in <module>
    print doc.Content.Text
  File "D:\Softwares\Python27\lib\encodings\cp1252.py", line 12, in encode


    return codecs.charmap_encode(input,errors,encoding_table)
UnicodeEncodeError: 'charmap' codec can't encode character u'\uf06d' in position 4397: character maps to <undefined>

How can I fix the problem. Thanks.





_______________________________________________

python-win32 mailing list

python-win32 at python.org

http://mail.python.org/mailman/listinfo/python-win32




-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-win32/attachments/20110614/028ac121/attachment.html>


More information about the python-win32 mailing list