[Python-Dev] Adding Japanese Codecs to the distro

M.-A. Lemburg mal@lemburg.com
Wed, 22 Jan 2003 13:06:47 +0100


Atsuo Ishimoto wrote:
> On Wed, 22 Jan 2003 10:29:54 +0100
> "M.-A. Lemburg" <mal@lemburg.com> wrote:
> 
>>The problem I see is size: Tamito's codecs have an installed
>>size of 1790kB while Hisao's codecs are around 81kB.
>>
> 
> You cannot compare size of untared files here.

I was talking about the *installed* size, ie. the size
of the package in site-packages:

degas site-packages/japanese# du
337     ./c
1252    ./mappings
88      ./python
8       ./aliases
1790    .

Hisao's Python codec is only 85kB in size.

Now, if we took the only the C version of Tamito's codec, we'd
end up with around 1790 - 1252 - 88 = 450 kB. Still a factor of
5...

I wonder whether it wouldn't be possible to use the same tricks
Hisao used in his codec for a C version.

> Tamito's codecs package
> contains source of C version and Python version.  About 1 MB in 1790kB
> is size of C sources.
> 
> So, I'm proposing to add only C version of codec from JapaneseCodecs
> package. As I mentioned, size of C version is about 160 KB in Win32
> binary form, excluding tests and documentations. I don't see a
> significant difference between them.
> 
> If size of C sources(about 1 MB) is matter, we may be able to reduce it.

The source code size is not that important. The install size
is and even more the memory footprint.

Hisao's approach uses a single table which fits into 58kB Python
source code. Boil that down to a static C table and you'll end up
with something around 10-20kB for static C data. Hisao does
still builds a dictionary using this data, but perhaps that step
could be avoided using the same techniques that Fredrik used
in boiling down the size of the unicodedata module (which holds
the Unicode Database).

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
_______________________________________________________________________
eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,...
Python Consulting:                               http://www.egenix.com/
Python Software:                    http://www.egenix.com/files/python/