[Python-Dev] Adding Japanese Codecs to the distro

Atsuo Ishimoto ishimoto@gembook.org
Wed, 22 Jan 2003 21:50:44 +0900


On Wed, 22 Jan 2003 13:06:47 +0100
"M.-A. Lemburg" <mal@lemburg.com> wrote:

> I was talking about the *installed* size, ie. the size
> of the package in site-packages:

I'm sorry for my misunderstanding.

> Now, if we took the only the C version of Tamito's codec, we'd
> end up with around 1790 - 1252 - 88 = 450 kB. Still a factor of
> 5...
> 

Please try
   strip ./c/_japanese_codecs.so

In my linux box, this reduces size of _japanese_codecs.so from 530 KB
into 135 KB. I think this is reasonable size because it contains more
tables than Hisao's version. 

> Hisao's approach uses a single table which fits into 58kB Python
> source code. Boil that down to a static C table and you'll end up
> with something around 10-20kB for static C data. Hisao does
> still builds a dictionary using this data, but perhaps that step
> could be avoided using the same techniques that Fredrik used
> in boiling down the size of the unicodedata module (which holds
> the Unicode Database).
> 

Thank you for your advice. I will try it later, if you still think
JapaneseCodec is too large.

--------------------------
Atsuo Ishimoto
ishimoto@gembook.org
Homepage:http://www.gembook.jp