[Tutor] UnicodeEncodeError: 'cp932' codec can't encode character '\xe9' in position

Sun Mar 11 11:21:37 CET 2012

Robert Sjoblom wrote:

> Okay, so here's a fun one. Since I'm on a japanese locale my native
> encoding is cp932. I was thinking of writing a parser for a bunch of
> text files, but I stumbled on even printing the contents due to ...
> something. I don't know what encoding the text file uses, which isn't
> helping my case either (I have asked, but I've yet to get an answer).
> 
> Okay, so:
> 
> address = "C:/Path/to/file/file.ext"
> with open(address, encoding="cp1252") as alpha:

Superfluous readlines() alert:

>     text = alpha.readlines()
>     for line in text:
>         print(line)

You can iterate over the file directly with

#python3
for line in alpha:
    print(line, end="")

or even

sys.stdout.writelines(alpha)

> It starts to print until it hits the wonderful character é or '\xe9',
> where it gives me this happy traceback:
> Traceback (most recent call last):
>   File "C:\Users\Azaz\Desktop\CK2 Map Painter\Parser\test parser.py",
> line 8, in <module>
>     print(line)
> UnicodeEncodeError: 'cp932' codec can't encode character '\xe9' in
> position 13: illegal multibyte sequence
> 
> I can open the document and view it in UltraEdit -- and it displays
> correct characters there -- but UE can't give me what encoding it
> uses. Any chance of solving this without having to switch from my
> japanese locale? Also, the cp1252 is just an educated guess, but it
> doesn't really matter because it always comes back to the cp932 error.

# python3 
output_encoding = sys.stdout.encoding or "UTF-8"
error_handling = "replace"
Writer = codecs.getwriter(output_encoding)

outstream = Writer(sys.stdout.buffer, error_handling)
with open(filename, "r", encoding="cp1252") as instream:
    for line in instream:
        print(line, end="", file=outstream)

error_handling = "replace" prints "?" for characters that cannot be 
displayed in the target encoding.