[New-bugs-announce] [issue2066] Adding new CNS11643 support, a *huge* charset, in cjkcodecs

Hye-Shik Chang report at bugs.python.org
Mon Feb 11 12:58:54 CET 2008

New submission from Hye-Shik Chang:

This patch adds CNS11643 support into Python unicode codecs.
CNS11643 is a huge character which is used in EUC-TW and ISO-2022-CN.
CJKCodecs have had the CNS11643 support for 4 years at least,
but I dropped it because of its huge size in integrating into Python.
EUC-TW and ISO-2022-CN aren't being used widely while they are
still regarded as part of major encodings yet.

In my patch, disabling the CNS11643 charset support is possible by
adding -DNO_CNS11643 in CFLAGS for light platforms. Mapping source
code size of the charset is 900K and it adds about 350K into
_codecs_tw.so (in POSIX) or python26.dll (in Win32).

What do you think about adding this code?

components: Unicode
files: cns11643-r1.diff.gz
messages: 62282
nosy: hyeshik.chang
priority: low
severity: normal
status: open
title: Adding new CNS11643 support, a *huge* charset, in cjkcodecs
versions: Python 2.6, Python 3.0
Added file: http://bugs.python.org/file9408/cns11643-r1.diff.gz

Tracker <report at bugs.python.org>

More information about the New-bugs-announce mailing list