[Python-Dev] Unicode 5.1.0

Guido van Rossum guido at python.org
Fri Aug 22 16:59:46 CEST 2008

On Fri, Aug 22, 2008 at 3:47 AM, Fredrik Lundh <fredrik at pythonware.com> wrote:
> On Fri, Aug 22, 2008 at 3:25 AM, Guido van Rossum <guido at python.org> wrote:
>>> So while we could say: "we provide access to the Unicode 5.1.0
>>> database", we cannot say: "we support Unicode 5.1.0", simply because
>>> we have not reviewed the all the necessary changes and implications.
>> Mark's response to this was:
>> """
>> I'd suspect that you'll be as conformant to U5.1.0 as you were to U4.1.0 ;-)
> is the suggestion to *replace* the 4.1.0 database with a 5.1.0
> database, or to add yet another database in that module?

That's up to us. I don't know what the reason was for keeping the
3.2.0 database around -- does anyone here recall ever using it? For

I think Mark believes that 5.1.0 is very much backwards compatible
with 4.1.0 so that there is no need to retain access to 4.1.0; but as
I said I don't know the use case so who knows.

> (how's the 3.2/4.1 dual support implemented?  do we have two distinct
> datasets, or are the differences encoded in some clever way?  would it
> make sense to split the unicodedata module into three separate
> modules, one for each major Unicode version?)

The current API looks fine to me: unicodedata is the latest version
whereas unicodedata.ucd_3_2_0 is the older version. The APIs are the
same; there's a tiny bit of code in the generated _db.h file that
expresses the differences:

static const change_record* get_change_3_2_0(Py_UCS4 n)
        int index;
        if (n >= 0x110000) index = 0;
        else {
                index = changes_3_2_0_index[n>>7];
                index = changes_3_2_0_data[(index<<7)+(n & 127)];
        return change_records_3_2_0+index;

static Py_UCS4 normalization_3_2_0(Py_UCS4 n)
        switch(n) {
        case 0x2f868: return 0x2136A;
        case 0x2f874: return 0x5F33;
        case 0x2f91f: return 0x43AB;
        case 0x2f95f: return 0x7AAE;
        case 0x2f9bf: return 0x4D57;
        default: return 0;

--Guido van Rossum (home page: http://www.python.org/~guido/)

More information about the Python-Dev mailing list