regenerating unicodedata for py2.7 using py3 makeunicodedata.py?
Vlastimil Brom
vlastimil.brom at gmail.com
Fri Nov 19 09:49:21 EST 2010
2010/11/18 Martin v. Loewis <martin at v.loewis.de>:
>
>> Thanks for the confirmation Martin!
>>
>> Do you think, it the mentioned omission of the character names of some
>> CJK ranges in unicodedata intended, or should it be reported to the
>> tracker?
>
> It's certainly a bug. So a bug report would be appreciated, but much
> more so a patch. Ideally, the patch would either be completely
> forward-compatible (should the CJK ranges change in future Unicode
> versions),
> or at least have a safe-guard to detect that the data file is getting
> out of sync with the C implementation.
>
> Regards,
> Martin
>
Thanks,
I just created a bug ticket:
http://bugs.python.org/issue10459
The omissions of character names seem to be:
龼 (0x9fbc) - 鿋 (0x9fcb)
(CJK Unified Ideographs [19968-40959] [0x4e00-0x9fff])
𪜀 (0x2a700) - 𫜴 (0x2b734)
(CJK Unified Ideographs Extension C [173824-177983] [0x2a700-0x2b73f])
𫝀 (0x2b740) - 𫠝 (0x2b81d)
(CJK Unified Ideographs Extension D [177984-178207] [0x2b740-0x2b81f])
(Also the unprintable ASCII controls, Surrogates and Private use area,
where the missing names are probably ok.)
Unfortunately, I am not able to provide a patch, mainly because of
unicodadate being C code.
A while ago I considered writing some unicodedata enhancements in
python, which would support the ranges and script names, full category
names etc., but sofar the direct programatic lookups in the online
unicode docs and with some simple processing also do work
sufficiently...
Regards,
Vlastimil Brom
More information about the Python-list
mailing list