[Python-ideas] [issue33865] [EASY] Missing code page aliases: "unknown encoding: 874"
Stephen J. Turnbull
turnbull.stephen.fw at u.tsukuba.ac.jp
Sun Jun 17 08:02:02 EDT 2018
Folks. There are standards. "1252" *is not* an alias for
"windows-1252" according to the IANA, while "866" *is* an alias for
"IBM866" according to the same authority. Most 3-digit "IBMxxx" ARE
aliased to both "cpxxx" and just "xxx", but not all. None of
"IBM874", "874", or "cp874" exists according to the IANA.
For the reasons Steven gave, I would say omit the digits-only aliases,
but if we must use them because "there's a standard" (or backward
compatibility), we should stick to those defined by standard, and only
If we're following other standards that I'm unaware of, fine, but
let's cite them rather than randomly introduce a plethora of aliases
because they "look like" an existing (and unfortunate) standard.
There's also some other weirdness with "windows-874", see below. We
(somebody) should check other "windows-xxx" character sets to make
sure they're not misnamed "cpxxx".
Steven D'Aprano writes:
> > It is easy to test it. Encoding/decoding with '874' should give the
> > same result as with 'cp874'.
> I know it is too late to remove that feature, but why do we support
> digit-only IDs for encodings? They can be ambiguous. If Wikipedia is
> correct, cp874 (also known as ibm874) and Windows-874 (also known as
> cp1162) are different:
According to the IANA, they're not necessarily ambiguous. Here is
the entry for IBM866:
IBM866 2086 IBM NLDG Volume 2 cp866
(SE09-8002-03) August 1994 866
where the entries in column 4 show the registered aliases. There are
at least a dozen IBMxxx character sets with 'xxx' aliases.
I don't understand what's with "cp874", though. We can surely take
that one back, although we'd better hurry if it's in 3.7rc. We might
want to add "windows-874" (which does't seem to be present in Python
3.6), since that's the standard character set name per IANA.
The confusion between cp874 and windows-874 may be because in
VENDORS/MICSFT/WINDOWS it's in CP874.TXT (as are all the code pages
I don't know where Wikipedia's information comes from, but it's not
Associate Professor Division of Policy and Planning Science
http://turnbull.sk.tsukuba.ac.jp/ Faculty of Systems and Information
Email: turnbull at sk.tsukuba.ac.jp University of Tsukuba
Tel: 029-853-5175 Tennodai 1-1-1, Tsukuba 305-8573 JAPAN
More information about the Python-ideas