[Python-Dev] Adding Japanese Codecs to the distro

Martin v. Löwis martin@v.loewis.de
22 Jan 2003 15:23:41 +0100


"M.-A. Lemburg" <mal@lemburg.com> writes:

> I was talking about the *installed* size, ie. the size
> of the package in site-packages:

Right. And we are trying to tell you that this is irrelevant when
talking about the size increase to be expected when JapaneseCodecs is
incorporated into Python.

> degas site-packages/japanese# du
> 337     ./c
> 1252    ./mappings
> 88      ./python
> 8       ./aliases

You should ignore mappings and python in your counting, they are not
needed.

> I wonder whether it wouldn't be possible to use the same tricks
> Hisao used in his codec for a C version.

I believe it does use the same tricks. It's just that the
JapaneseCodecs package supports a number of widely-used encodings
which Hisao's package does not support.

> The source code size is not that important. The install size
> is and even more the memory footprint.

Computing the memory footprint is very difficult, of course.

> Hisao's approach uses a single table which fits into 58kB Python
> source code. Boil that down to a static C table and you'll end up
> with something around 10-20kB for static C data. 

How did you obtain this number? 

> Hisao does still builds a dictionary using this data, but perhaps
> that step could be avoided using the same techniques that Fredrik
> used in boiling down the size of the unicodedata module (which holds
> the Unicode Database).

Perhaps, yes. Have you studied the actual data to see whether these
techniques might help or not?

Regards,
Martin