On Wed, Jun 16, 2004 at 11:33:59AM +0200, M.-A. Lemburg wrote:
Hye-Shik Chang wrote: [snip]
2. Merge two or three simliar C codecs into one. We have one C codec for every each python codecs currently. I have got an idea to merge them into several similar groups and many common part of .so binaries will be saved:
_codecs_jacodecs_1.so: euc-jp, shift-jis, iso-2022-jp, iso-2022-jp-1, iso-2022-jp-ext _codecs_jacodecs_2.so: euc-jisx0213, shift-jisx0213, iso-2022-jp-3, euc-jis-2004, shift-jis-2004, iso-2022-jp-2004 _codecs_jacodecs_3.so: iso-2022-jp-2 _codecs_kocodecs_1.so: euc-kr, johab, iso-2022-kr _codecs_kocodecs_2.so: cp949 _codecs_zhcodecs_1.so: gb2312, gbk, gb18030, hz _codecs_zhcodecs_2.so: big5, cp950
+1, but why not put all Japanese codecs into one module and dito for the Korean and Chinese ones ?
Note that todays OS linkers will only mmap those pieces of code into the process memory that are actually needed by the application, so even though the size of the modules increases, the application process memory foot-print is likely not to increase.
Okay. But how about embedded, freezed environments or statically compiled into python by uncommenting from Modules/Setup? If somebody need to support only legacy Japanese encodings, he will want to include a legacy mapping(70K) but will not want JIS X 0213(85K) and KS X 1001, GB2312 mappings(200K, for iso-2022-jp-2). And he may want to save spaces by just erasing files. In fact, I don't know how real Japanese developers use but just guessed it. :) [snip]
If you don't believe this, compare the resident size of Python with and without unicodedata loaded. The difference on my machine is a measily 30kB, not the 250kB of the complete module.
I do believe this. This is also why I wrote cjkcodecs in not pure Python but C extensions. Hye-Shik