[Python-Dev] Unicode 5.1.0
Guido van Rossum
guido at python.org
Fri Aug 22 18:12:55 CEST 2008
2008/8/22 Fredrik Lundh <fredrik at pythonware.com>:
> On Fri, Aug 22, 2008 at 4:59 PM, Guido van Rossum <guido at python.org>
wrote:
>
>>> (how's the 3.2/4.1 dual support implemented? do we have two distinct
>>> datasets, or are the differences encoded in some clever way? would it
>>> make sense to split the unicodedata module into three separate
>>> modules, one for each major Unicode version?)
>>
>> The current API looks fine to me: unicodedata is the latest version
>> whereas unicodedata.ucd_3_2_0 is the older version. The APIs are the
>> same; there's a tiny bit of code in the generated _db.h file that
>> expresses the differences:
>>
>> static const change_record* get_change_3_2_0(Py_UCS4 n)
>> {
>> int index;
>> if (n >= 0x110000) index = 0;
>> else {
>> index = changes_3_2_0_index[n>>7];
>> index = changes_3_2_0_data[(index<<7)+(n & 127)];
>> }
>> return change_records_3_2_0+index;
>> }
>
> there's a bunch of data tables as well, but they don't seem to be very
> large. looks like Martin did a thorough job here.
>
> ... digging digging digging ...
>
> yes, the generator script produces difference tables between the main
> version and a list of older versions. I'd say it's worth running the
> script on the 5.1.0 tables, and if it doesn't choke, compare the
> resulting table with the corresponding table for 4.1.0 (a simple loop
> fetching the main properties for all code points). if the differences
> look reasonably small, switch 5.1.0 and keep the others.
Right, that's my hope as well. I believe the changes between 3.2 and 4.1
were much larger than more recent changes. (Yay convergence! :-)
> I can tinker a little with this over the weekend, unless Martin tells
> me not to ;-)
That would be great!
--
--Guido van Rossum (home page: http://www.python.org/~guido/)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20080822/b9223317/attachment.htm>
More information about the Python-Dev
mailing list