Extended ASCII

Jon Ribbens jon+usenet at unequivocal.eu
Fri Jan 13 17:43:24 EST 2017


On 2017-01-13, D'Arcy Cain <darcy at VybeNetworks.com> wrote:
> I thought I was done with this crap once I moved to 3.x but some 
> Winblows machines are still sending what some circles call "Extended 
> ASCII".  I have a file that I am trying to read and it is barfing on 
> some characters.  For example:
>
>    due to the Qu\xe9bec government
>
> Obviously should be "due to the Québec government".  I can't figure out 
> what that encoding is or if it is anything that can even be understood 
> outside of M$.

$ cat decode.py
#!/usr/bin/env python3

CODECS = (
    "ascii", "big5", "big5hkscs", "cp037", "cp273", "cp424", "cp437", "cp500",
    "cp720", "cp737", "cp775", "cp850", "cp852", "cp855", "cp856", "cp857",
    "cp858", "cp860", "cp861", "cp862", "cp863", "cp864", "cp865", "cp866",
    "cp869", "cp874", "cp875", "cp932", "cp949", "cp950", "cp1006", "cp1026",
    "cp1125", "cp1140", "cp1250", "cp1251", "cp1252", "cp1253", "cp1254",
    "cp1255", "cp1256", "cp1257", "cp1258", "cp65001", "euc_jp",
    "euc_jis_2004", "euc_jisx0213", "euc_kr", "gb2312", "gbk", "gb18030", "hz",
    "iso2022_jp", "iso2022_jp_1", "iso2022_jp_2", "iso2022_jp_2004",
    "iso2022_jp_3", "iso2022_jp_ext", "iso2022_kr", "latin_1", "iso8859_2",
    "iso8859_3", "iso8859_4", "iso8859_5", "iso8859_6", "iso8859_7",
    "iso8859_8", "iso8859_9", "iso8859_10", "iso8859_11", "iso8859_13",
    "iso8859_14", "iso8859_15", "iso8859_16", "johab", "koi8_r", "koi8_t",
    "koi8_u", "kz1048", "mac_cyrillic", "mac_greek", "mac_iceland",
    "mac_latin2", "mac_roman", "mac_turkish", "ptcp154", "shift_jis",
    "shift_jis_2004", "shift_jisx0213", "utf_32", "utf_32_be", "utf_32_le",
    "utf_16", "utf_16_be", "utf_16_le", "utf_7", "utf_8", "utf_8_sig",
)

for encoding in CODECS:
    try:
        if b"Qu\xe9bec".decode(encoding) == "Québec":
            print(encoding)
    except (UnicodeError, LookupError):
        pass

$ ./decode.py 
cp1250
cp1252
cp1254
cp1256
cp1257
cp1258
latin_1
iso8859_2
iso8859_3
iso8859_4
iso8859_9
iso8859_10
iso8859_13
iso8859_14
iso8859_15
iso8859_16


More information about the Python-list mailing list