[Tutor] UnicodeEncodeError: 'cp932' codec can't encode character '\xe9' in position
__peter__ at web.de
Sun Mar 11 11:21:37 CET 2012
Robert Sjoblom wrote:
> Okay, so here's a fun one. Since I'm on a japanese locale my native
> encoding is cp932. I was thinking of writing a parser for a bunch of
> text files, but I stumbled on even printing the contents due to ...
> something. I don't know what encoding the text file uses, which isn't
> helping my case either (I have asked, but I've yet to get an answer).
> Okay, so:
> address = "C:/Path/to/file/file.ext"
> with open(address, encoding="cp1252") as alpha:
Superfluous readlines() alert:
> text = alpha.readlines()
> for line in text:
You can iterate over the file directly with
for line in alpha:
> It starts to print until it hits the wonderful character é or '\xe9',
> where it gives me this happy traceback:
> Traceback (most recent call last):
> File "C:\Users\Azaz\Desktop\CK2 Map Painter\Parser\test parser.py",
> line 8, in <module>
> UnicodeEncodeError: 'cp932' codec can't encode character '\xe9' in
> position 13: illegal multibyte sequence
> I can open the document and view it in UltraEdit -- and it displays
> correct characters there -- but UE can't give me what encoding it
> uses. Any chance of solving this without having to switch from my
> japanese locale? Also, the cp1252 is just an educated guess, but it
> doesn't really matter because it always comes back to the cp932 error.
output_encoding = sys.stdout.encoding or "UTF-8"
error_handling = "replace"
Writer = codecs.getwriter(output_encoding)
outstream = Writer(sys.stdout.buffer, error_handling)
with open(filename, "r", encoding="cp1252") as instream:
for line in instream:
print(line, end="", file=outstream)
error_handling = "replace" prints "?" for characters that cannot be
displayed in the target encoding.
More information about the Tutor