[Python-Dev] Unicode 5.1.0
Guido van Rossum
guido at python.org
Fri Aug 22 16:59:46 CEST 2008
On Fri, Aug 22, 2008 at 3:47 AM, Fredrik Lundh <fredrik at pythonware.com> wrote:
> On Fri, Aug 22, 2008 at 3:25 AM, Guido van Rossum <guido at python.org> wrote:
[MAL]
>>> So while we could say: "we provide access to the Unicode 5.1.0
>>> database", we cannot say: "we support Unicode 5.1.0", simply because
>>> we have not reviewed the all the necessary changes and implications.
>>
>> Mark's response to this was:
>>
>> """
>> I'd suspect that you'll be as conformant to U5.1.0 as you were to U4.1.0 ;-)
>
> is the suggestion to *replace* the 4.1.0 database with a 5.1.0
> database, or to add yet another database in that module?
That's up to us. I don't know what the reason was for keeping the
3.2.0 database around -- does anyone here recall ever using it? For
what?
I think Mark believes that 5.1.0 is very much backwards compatible
with 4.1.0 so that there is no need to retain access to 4.1.0; but as
I said I don't know the use case so who knows.
> (how's the 3.2/4.1 dual support implemented? do we have two distinct
> datasets, or are the differences encoded in some clever way? would it
> make sense to split the unicodedata module into three separate
> modules, one for each major Unicode version?)
The current API looks fine to me: unicodedata is the latest version
whereas unicodedata.ucd_3_2_0 is the older version. The APIs are the
same; there's a tiny bit of code in the generated _db.h file that
expresses the differences:
static const change_record* get_change_3_2_0(Py_UCS4 n)
{
int index;
if (n >= 0x110000) index = 0;
else {
index = changes_3_2_0_index[n>>7];
index = changes_3_2_0_data[(index<<7)+(n & 127)];
}
return change_records_3_2_0+index;
}
static Py_UCS4 normalization_3_2_0(Py_UCS4 n)
{
switch(n) {
case 0x2f868: return 0x2136A;
case 0x2f874: return 0x5F33;
case 0x2f91f: return 0x43AB;
case 0x2f95f: return 0x7AAE;
case 0x2f9bf: return 0x4D57;
default: return 0;
}
}
--
--Guido van Rossum (home page: http://www.python.org/~guido/)
More information about the Python-Dev
mailing list