[Python-Dev] Adding Japanese Codecs to the distro
Martin v. Löwis
martin@v.loewis.de
22 Jan 2003 15:23:41 +0100
"M.-A. Lemburg" <mal@lemburg.com> writes:
> I was talking about the *installed* size, ie. the size
> of the package in site-packages:
Right. And we are trying to tell you that this is irrelevant when
talking about the size increase to be expected when JapaneseCodecs is
incorporated into Python.
> degas site-packages/japanese# du
> 337 ./c
> 1252 ./mappings
> 88 ./python
> 8 ./aliases
You should ignore mappings and python in your counting, they are not
needed.
> I wonder whether it wouldn't be possible to use the same tricks
> Hisao used in his codec for a C version.
I believe it does use the same tricks. It's just that the
JapaneseCodecs package supports a number of widely-used encodings
which Hisao's package does not support.
> The source code size is not that important. The install size
> is and even more the memory footprint.
Computing the memory footprint is very difficult, of course.
> Hisao's approach uses a single table which fits into 58kB Python
> source code. Boil that down to a static C table and you'll end up
> with something around 10-20kB for static C data.
How did you obtain this number?
> Hisao does still builds a dictionary using this data, but perhaps
> that step could be avoided using the same techniques that Fredrik
> used in boiling down the size of the unicodedata module (which holds
> the Unicode Database).
Perhaps, yes. Have you studied the actual data to see whether these
techniques might help or not?
Regards,
Martin