Extended ASCII
Jon Ribbens
jon+usenet at unequivocal.eu
Fri Jan 13 17:43:24 EST 2017
On 2017-01-13, D'Arcy Cain <darcy at VybeNetworks.com> wrote:
> I thought I was done with this crap once I moved to 3.x but some
> Winblows machines are still sending what some circles call "Extended
> ASCII". I have a file that I am trying to read and it is barfing on
> some characters. For example:
>
> due to the Qu\xe9bec government
>
> Obviously should be "due to the Québec government". I can't figure out
> what that encoding is or if it is anything that can even be understood
> outside of M$.
$ cat decode.py
#!/usr/bin/env python3
CODECS = (
"ascii", "big5", "big5hkscs", "cp037", "cp273", "cp424", "cp437", "cp500",
"cp720", "cp737", "cp775", "cp850", "cp852", "cp855", "cp856", "cp857",
"cp858", "cp860", "cp861", "cp862", "cp863", "cp864", "cp865", "cp866",
"cp869", "cp874", "cp875", "cp932", "cp949", "cp950", "cp1006", "cp1026",
"cp1125", "cp1140", "cp1250", "cp1251", "cp1252", "cp1253", "cp1254",
"cp1255", "cp1256", "cp1257", "cp1258", "cp65001", "euc_jp",
"euc_jis_2004", "euc_jisx0213", "euc_kr", "gb2312", "gbk", "gb18030", "hz",
"iso2022_jp", "iso2022_jp_1", "iso2022_jp_2", "iso2022_jp_2004",
"iso2022_jp_3", "iso2022_jp_ext", "iso2022_kr", "latin_1", "iso8859_2",
"iso8859_3", "iso8859_4", "iso8859_5", "iso8859_6", "iso8859_7",
"iso8859_8", "iso8859_9", "iso8859_10", "iso8859_11", "iso8859_13",
"iso8859_14", "iso8859_15", "iso8859_16", "johab", "koi8_r", "koi8_t",
"koi8_u", "kz1048", "mac_cyrillic", "mac_greek", "mac_iceland",
"mac_latin2", "mac_roman", "mac_turkish", "ptcp154", "shift_jis",
"shift_jis_2004", "shift_jisx0213", "utf_32", "utf_32_be", "utf_32_le",
"utf_16", "utf_16_be", "utf_16_le", "utf_7", "utf_8", "utf_8_sig",
)
for encoding in CODECS:
try:
if b"Qu\xe9bec".decode(encoding) == "Québec":
print(encoding)
except (UnicodeError, LookupError):
pass
$ ./decode.py
cp1250
cp1252
cp1254
cp1256
cp1257
cp1258
latin_1
iso8859_2
iso8859_3
iso8859_4
iso8859_9
iso8859_10
iso8859_13
iso8859_14
iso8859_15
iso8859_16
More information about the Python-list
mailing list