Hye-Shik Chang wrote:
I have planned few things to update in cjkcodecs before 2.4 alpha1 is out. If you have any opionions or objections, please tell me.
1. Update JIS X 0213 to its first amendment (a.k.a JIS X 0213:2004) This will introduce three new encodings; euc-jis-2004, shift_jis-2004 and iso-2022-jp-2004. It's not so different from their each preceding encodings but we may need to keep both of versions due to incompatibilities and encoding name change. (This won't bloat code size a lot. I expect it around 3~5K.)
+1
2. Merge two or three simliar C codecs into one. We have one C codec for every each python codecs currently. I have got an idea to merge them into several similar groups and many common part of .so binaries will be saved:
_codecs_jacodecs_1.so: euc-jp, shift-jis, iso-2022-jp, iso-2022-jp-1, iso-2022-jp-ext _codecs_jacodecs_2.so: euc-jisx0213, shift-jisx0213, iso-2022-jp-3, euc-jis-2004, shift-jis-2004, iso-2022-jp-2004 _codecs_jacodecs_3.so: iso-2022-jp-2 _codecs_kocodecs_1.so: euc-kr, johab, iso-2022-kr _codecs_kocodecs_2.so: cp949 _codecs_zhcodecs_1.so: gb2312, gbk, gb18030, hz _codecs_zhcodecs_2.so: big5, cp950
+1, but why not put all Japanese codecs into one module and dito for the Korean and Chinese ones ? Note that todays OS linkers will only mmap those pieces of code into the process memory that are actually needed by the application, so even though the size of the modules increases, the application process memory foot-print is likely not to increase.
3. Split some mapping keeper modules to few group-based modules. This will save memory and spaces for who need only legacy codecs like "euc-kr only".
_codecs_mapdata_ko_KR -> _codecs_komapdata_1.so: KS X 1001 _codecs_komapdata_2.so: cp949
_codecs_mapdata_ja_JP -> _codecs_jamapdata_1.so: JIS X 0208, JIS X 0212 _codecs_jamapdata_2.so: JIS X 0213:2000 and :2004
_codecs_mapdata_zh_CN -> _codecs_zhmapdata_1.so: gb2312, gbk, gb18030
_codecs_mapdata_zh_TW -> _codecs_zhmapdata_2.so: big5, cp950
-1 See above: this is static C data, so splitting these won't really buy the user anything. If you don't believe this, compare the resident size of Python with and without unicodedata loaded. The difference on my machine is a measily 30kB, not the 250kB of the complete module.
If these sound acceptable for python-dev people, they will be implemented as CJKCodecs 1.1 first and imported into python later (before 2.4a1).
-- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 16 2004)
Python/Zope Consulting and Support ... http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::