[Patches] [ python-Patches-873597 ] The cjkcodecs integration

SourceForge.net noreply at sourceforge.net
Sat Jan 17 09:47:11 EST 2004


Patches item #873597, was opened at 2004-01-09 16:55
Message generated for change (Comment added) made by perky
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=873597&group_id=5470

Category: Library (Lib)
Group: Python 2.4
>Status: Closed
Resolution: Accepted
Priority: 5
Submitted By: Hye-Shik Chang (perky)
Assigned to: Hye-Shik Chang (perky)
Summary: The cjkcodecs integration

Initial Comment:
(finally :)

CJKCodecs includes support for many East Asian legacy
encodings:

* Chinese (PRC): gb2312 gbk gb18030 hz
* Chinese (ROC): big5 cp950
* Japanese: cp932 shift-jis shift-jisx0213 euc-jp
euc-jisx0213 iso-2022-jp iso-2022-jp-1 iso-2022-jp-2
iso-2022-jp-3 iso-2022-jp-ext
* Korean: cp949 euc-kr johab iso-2022-kr

CJKCodecs integration to main python will make CJK
users more comfortable with the default installation
package.

And it's not as big as you might guess. :)

It bloats only 2% by source size:

% du -d0 -k python
37714   python
% du -d0 -k python+cjkcodecs
38504   python+cjkcodecs

And it bloats only 4% by source lines:

% echo `find python.cjkcodecs -type f -exec cat {}
\;|wc -l` "*100/" `find python -type f -exec cat {}
\;|wc -l` "-100" | bc
4


----------------------------------------------------------------------

>Comment By: Hye-Shik Chang (perky)
Date: 2004-01-17 23:47

Message:
Logged In: YES 
user_id=55188

Okay. Committed as:

Modified files:

Doc/lib/libcodecs.tex 1.27
Lib/email/test/test_email_codecs.py 1.5
Lib/encodings/aliases.py 1.21
Modules/Setup.dist 1.43
Lib/test/regrtest.py 1.151
setup.py 1.181


Added files:

Lib/encodings/big5.py
Lib/encodings/cp932.py
Lib/encodings/cp949.py
Lib/encodings/cp950.py
Lib/encodings/euc_jisx0213.py
Lib/encodings/euc_jp.py
Lib/encodings/euc_kr.py
Lib/encodings/gb18030.py
Lib/encodings/gb2312.py
Lib/encodings/gbk.py
Lib/encodings/iso2022_jp.py
Lib/encodings/iso2022_jp_1.py
Lib/encodings/iso2022_jp_2.py
Lib/encodings/iso2022_jp_3.py
Lib/encodings/iso2022_jp_ext.py
Lib/encodings/iso2022_kr.py
Lib/encodings/johab.py
Lib/encodings/shift_jis.py
Lib/encodings/shift_jisx0213.py
Lib/test/cjkencodings_test.py
Lib/test/test_codecencodings_cn.py
Lib/test/test_codecencodings_jp.py
Lib/test/test_codecencodings_kr.py
Lib/test/test_codecencodings_tw.py
Lib/test/test_codecmaps_cn.py
Lib/test/test_codecmaps_jp.py
Lib/test/test_codecmaps_kr.py
Lib/test/test_codecmaps_tw.py
Lib/test/test_multibytecodec.py
Lib/test/test_multibytecodec_support.py
Modules/cjkcodecs/README
Modules/cjkcodecs/_big5.c
Modules/cjkcodecs/_cp932.c
Modules/cjkcodecs/_cp949.c
Modules/cjkcodecs/_cp950.c
Modules/cjkcodecs/_euc_jisx0213.c
Modules/cjkcodecs/_euc_jp.c
Modules/cjkcodecs/_euc_kr.c
Modules/cjkcodecs/_gb18030.c
Modules/cjkcodecs/_gb2312.c
Modules/cjkcodecs/_gbk.c
Modules/cjkcodecs/_hz.c
Modules/cjkcodecs/_iso2022_jp.c
Modules/cjkcodecs/_iso2022_jp_1.c
Modules/cjkcodecs/_iso2022_jp_2.c
Modules/cjkcodecs/_iso2022_jp_3.c
Modules/cjkcodecs/_iso2022_jp_ext.c
Modules/cjkcodecs/_iso2022_kr.c
Modules/cjkcodecs/_johab.c
Modules/cjkcodecs/_shift_jis.c
Modules/cjkcodecs/_shift_jisx0213.c
Modules/cjkcodecs/alg_iso8859_1.h
Modules/cjkcodecs/alg_iso8859_7.h
Modules/cjkcodecs/alg_jisx0201.h
Modules/cjkcodecs/cjkcommon.h
Modules/cjkcodecs/codeccommon.h
Modules/cjkcodecs/codecentry.h
Modules/cjkcodecs/iso2022common.h
Modules/cjkcodecs/map_big5.h
Modules/cjkcodecs/map_cp932ext.h
Modules/cjkcodecs/map_cp949.h
Modules/cjkcodecs/map_cp949ext.h
Modules/cjkcodecs/map_cp950ext.h
Modules/cjkcodecs/map_gb18030ext.h
Modules/cjkcodecs/map_gb18030uni.h
Modules/cjkcodecs/map_gb2312.h
Modules/cjkcodecs/map_gbcommon.h
Modules/cjkcodecs/map_gbkext.h
Modules/cjkcodecs/map_jisx0208.h
Modules/cjkcodecs/map_jisx0212.h
Modules/cjkcodecs/map_jisx0213.h
Modules/cjkcodecs/map_jisx0213_pairs.h
Modules/cjkcodecs/map_jisxcommon.h
Modules/cjkcodecs/map_ksx1001.h
Modules/cjkcodecs/mapdata_ja_JP.c
Modules/cjkcodecs/mapdata_ko_KR.c
Modules/cjkcodecs/mapdata_zh_CN.c
Modules/cjkcodecs/mapdata_zh_TW.c
Modules/cjkcodecs/multibytecodec.c
Modules/cjkcodecs/multibytecodec.h
Modules/cjkcodecs/tweak_gbk.h

Thank you! :-)

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2004-01-10 06:24

Message:
Logged In: YES 
user_id=21627

These changes look good to me, please apply them.

As for the regrtest modification, please change the tests to
provide a skip_expected setting, which is computed depending
on the presence of the test data - see test_normalization.py
for an example.

It would be good if the header files containing large tables
would contain an indication on how these tables have been
created (e.g. what data source have been used, and what
modification had been applied after the tables where created
from the sources).

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2004-01-10 06:10

Message:
Logged In: YES 
user_id=21627

Can you please make that server report the file type as
application/octet-stream?

----------------------------------------------------------------------

Comment By: Hye-Shik Chang (perky)
Date: 2004-01-09 17:00

Message:
Logged In: YES 
user_id=55188

Hmm. SF seems not to accept big patches. (385KB)
I uploaded the patch to
http://people.freebsd.org/~perky/pythoncjkcodecs.diff.bz2 

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=873597&group_id=5470



More information about the Patches mailing list