[I18n-sig] Planned updates for cjkcodecs before 2.4a1

Hye-Shik Chang perky at i18n.org
Wed Jun 16 07:16:52 EDT 2004


On Wed, Jun 16, 2004 at 11:33:59AM +0200, M.-A. Lemburg wrote:
> Hye-Shik Chang wrote:
[snip]
> >2. Merge two or three simliar C codecs into one.  We have one C
> >   codec for every each python codecs currently.  I have got an
> >   idea to merge them into several similar groups and many common
> >   part of .so binaries will be saved:
> >
> >     _codecs_jacodecs_1.so: euc-jp, shift-jis, iso-2022-jp,
> >                            iso-2022-jp-1, iso-2022-jp-ext
> >     _codecs_jacodecs_2.so: euc-jisx0213, shift-jisx0213, iso-2022-jp-3,
> >			    euc-jis-2004, shift-jis-2004,
> >			    iso-2022-jp-2004
> >     _codecs_jacodecs_3.so: iso-2022-jp-2
> >     _codecs_kocodecs_1.so: euc-kr, johab, iso-2022-kr
> >     _codecs_kocodecs_2.so: cp949
> >     _codecs_zhcodecs_1.so: gb2312, gbk, gb18030, hz
> >     _codecs_zhcodecs_2.so: big5, cp950
> 
> 
> +1, but why not put all Japanese codecs into one module and
> dito for the Korean and Chinese ones ?
> 
> Note that todays OS linkers will only mmap those pieces
> of code into the process memory that are actually needed
> by the application, so even though the size of the modules
> increases, the application process memory foot-print is
> likely not to increase.

Okay. But how about embedded, freezed environments or statically
compiled into python by uncommenting from Modules/Setup?  If somebody
need to support only legacy Japanese encodings, he will want to
include a legacy mapping(70K) but will not want JIS X 0213(85K) and
KS X 1001, GB2312 mappings(200K, for iso-2022-jp-2).  And he may
want to save spaces by just erasing files.  In fact, I don't know
how real Japanese developers use but just guessed it. :)

[snip]
> 
> If you don't believe this, compare the resident size of
> Python with and without unicodedata loaded. The difference
> on my machine is a measily 30kB, not the 250kB of the complete
> module.

I do believe this.  This is also why I wrote cjkcodecs in not pure
Python but C extensions.


Hye-Shik
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 187 bytes
Desc: not available
Url : http://mail.python.org/pipermail/i18n-sig/attachments/20040616/66a5d767/attachment.bin


More information about the I18n-sig mailing list