[Python-Dev] Adding Japanese Codecs to the distro

M.-A. Lemburg mal@lemburg.com
Wed, 22 Jan 2003 16:08:44 +0100


Martin v. L=F6wis wrote:
> "M.-A. Lemburg" <mal@lemburg.com> writes:
>=20
>>I was talking about the *installed* size, ie. the size
>>of the package in site-packages:
>=20
> Right. And we are trying to tell you that this is irrelevant when
> talking about the size increase to be expected when JapaneseCodecs is
> incorporated into Python.

Why is it irrelevant ? If it would be irrelevant Fredrik wouldn't
have invested so much time in trimming down the footprint of the
Unicode database.

What we need is a generic approach here which works for more
than just the Japanese codecs. I believe that those codecs
could provide a good basis for more codecs from the Asian locale,
but before adding megabytes of mapping tables, I'd prefer to
settle for a good design first.

>>Hisao's approach uses a single table which fits into 58kB Python
>>source code. Boil that down to a static C table and you'll end up
>>with something around 10-20kB for static C data.=20
>
> How did you obtain this number?=20

By looking at the code. It uses Unicode literals to define
the table.

>>Hisao does still builds a dictionary using this data, but perhaps
>>that step could be avoided using the same techniques that Fredrik
>>used in boiling down the size of the unicodedata module (which holds
>>the Unicode Database).
>=20
> Perhaps, yes. Have you studied the actual data to see whether these
> techniques might help or not?

It's just a hint: mapping tables are all about fast lookup vs. memory
consumption and that's what Fredrik's approach of decomposition does
rather well (Tamito already uses such an approach). cdb would provide
an alternative approach, but there are licensing problems...

--=20
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
_______________________________________________________________________
eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,...
Python Consulting:                               http://www.egenix.com/
Python Software:                    http://www.egenix.com/files/python/