[Python-Dev] Unicode 5.1.0

"Martin v. Löwis" martin at v.loewis.de
Sun Aug 24 21:35:24 CEST 2008


> is the suggestion to *replace* the 4.1.0 database with a 5.1.0
> database, or to add yet another database in that module?

I would replace it.

> (how's the 3.2/4.1 dual support implemented?

The compiler needs data files for all supported versions, with
old_versions listing the, well, old versions. It then computes
deltas, expecting that they should mostly consist of new
assignments (i.e. characters unassigned in 3.2 might be assigned
in newer versions). It detects all differences, but might not be
able to represent all changes.

> do we have two distinct
> datasets, or are the differences encoded in some clever way?

The latter. It doesn't really need to be that clever: primarily
just a compressed list of "new" characters is needed, per version.

> would it
> make sense to split the unicodedata module into three separate
> modules, one for each major Unicode version?)

You couldn't use the space savings then, I suppose.

Regards,
Martin


More information about the Python-Dev mailing list