[Python-Dev] Adding Japanese Codecs to the distro

M.-A. Lemburg mal@lemburg.com
Wed, 22 Jan 2003 14:37:08 +0100


Atsuo Ishimoto wrote:
> On Wed, 22 Jan 2003 13:06:47 +0100
> "M.-A. Lemburg" <mal@lemburg.com> wrote:
> 
>>Now, if we took the only the C version of Tamito's codec, we'd
>>end up with around 1790 - 1252 - 88 = 450 kB. Still a factor of
>>5...
>>
> 
> Please try
>    strip ./c/_japanese_codecs.so
> 
> In my linux box, this reduces size of _japanese_codecs.so from 530 KB
> into 135 KB. I think this is reasonable size because it contains more
> tables than Hisao's version. 

Ok, we're finally approaching a very reasonable size :-)

BTW, why is it that Hisao can use one table for all supported
encodings where Tamito uses 6 tables ?

>>Hisao's approach uses a single table which fits into 58kB Python
>>source code. Boil that down to a static C table and you'll end up
>>with something around 10-20kB for static C data. Hisao does
>>still builds a dictionary using this data, but perhaps that step
>>could be avoided using the same techniques that Fredrik used
>>in boiling down the size of the unicodedata module (which holds
>>the Unicode Database).
> 
> Thank you for your advice. I will try it later, if you still think
> JapaneseCodec is too large.

That would be great, thanks !

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
_______________________________________________________________________
eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,...
Python Consulting:                               http://www.egenix.com/
Python Software:                    http://www.egenix.com/files/python/