[Python-Dev] Planned updates for cjkcodecs before 2.4a1

Hye-Shik Chang perky at i18n.org
Wed Jun 16 05:17:18 EDT 2004


I have planned few things to update in cjkcodecs before 2.4 alpha1
is out.  If you have any opionions or objections, please tell me.

1. Update JIS X 0213 to its first amendment (a.k.a JIS X 0213:2004)
   This will introduce three new encodings; euc-jis-2004, shift_jis-2004
   and iso-2022-jp-2004.  It's not so different from their each
   preceding encodings but we may need to keep both of versions due
   to incompatibilities and encoding name change.  (This won't bloat
   code size a lot. I expect it around 3~5K.)

2. Merge two or three simliar C codecs into one.  We have one C
   codec for every each python codecs currently.  I have got an
   idea to merge them into several similar groups and many common
   part of .so binaries will be saved:

     _codecs_jacodecs_1.so: euc-jp, shift-jis, iso-2022-jp,
                            iso-2022-jp-1, iso-2022-jp-ext
     _codecs_jacodecs_2.so: euc-jisx0213, shift-jisx0213, iso-2022-jp-3,
			    euc-jis-2004, shift-jis-2004,
			    iso-2022-jp-2004
     _codecs_jacodecs_3.so: iso-2022-jp-2
     _codecs_kocodecs_1.so: euc-kr, johab, iso-2022-kr
     _codecs_kocodecs_2.so: cp949
     _codecs_zhcodecs_1.so: gb2312, gbk, gb18030, hz
     _codecs_zhcodecs_2.so: big5, cp950

3. Split some mapping keeper modules to few group-based modules. This
   will save memory and spaces for who need only legacy codecs like
   "euc-kr only".

     _codecs_mapdata_ko_KR ->
         _codecs_komapdata_1.so: KS X 1001
         _codecs_komapdata_2.so: cp949

     _codecs_mapdata_ja_JP ->
         _codecs_jamapdata_1.so: JIS X 0208, JIS X 0212
         _codecs_jamapdata_2.so: JIS X 0213:2000 and :2004

     _codecs_mapdata_zh_CN ->
         _codecs_zhmapdata_1.so: gb2312, gbk, gb18030

     _codecs_mapdata_zh_TW ->
         _codecs_zhmapdata_2.so: big5, cp950


If these sound acceptable for python-dev people, they will be
implemented as CJKCodecs 1.1 first and imported into python later
(before 2.4a1).


Hye-Shik
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 187 bytes
Desc: not available
Url : http://mail.python.org/pipermail/python-dev/attachments/20040616/b576eb76/attachment.bin


More information about the Python-Dev mailing list