random832 at fastmail.com
Fri Jan 13 17:36:11 EST 2017
On Fri, Jan 13, 2017, at 17:24, D'Arcy Cain wrote:
> I thought I was done with this crap once I moved to 3.x but some
> Winblows machines are still sending what some circles call "Extended
> ASCII". I have a file that I am trying to read and it is barfing on
> some characters. For example:
> due to the Qu\xe9bec government
> Obviously should be "due to the Québec government". I can't figure out
> what that encoding is or if it is anything that can even be understood
> outside of M$. I have tried ascii, cp437, cp858, cp1140, cp1250,
> latin-1, utf8 and others. None of them recognize that character. Can
> someone tell me what encoding includes that character please.
It's latin-1 (or possibly cp1252 or something else depending on other
characters), your problem is elsewhere.
> Here is the failing code:
> with open(sys.argv, encoding="latin-1") as fp:
> for ln in fp:
> Traceback (most recent call last):
> File "./load_iff", line 11, in <module>
> UnicodeEncodeError: 'ascii' codec can't encode character '\xe9' in
> position 132: ordinal not in range(128)
Note that this is an encode error - it's converting *from* unicode *to*
bytes, for the print statement.
> I don't understand why the error says "ascii" when I told it to use
You set the encoding for the file, not the output. The problem is in
your print statement, and the fact that you probably have your locale
set to "C" or not set up at all instead of e.g. "en_CA.UTF-8".
More information about the Python-list