tjreedy at udel.edu
Thu Dec 15 16:57:24 EST 2016
On 12/15/2016 1:06 PM, MRAB wrote:
> On 2016-12-15 16:53, Steve D'Aprano wrote:
>> Suppose I have a Unicode character, and I want to determine the script or
>> scripts it belongs to.
>> For example:
>> U+0033 DIGIT THREE "3" belongs to the script "COMMON";
>> U+0061 LATIN SMALL LETTER A "a" belongs to the script "LATIN";
>> U+03BE GREEK SMALL LETTER XI "ξ" belongs to the script "GREEK".
>> Is this information available from Python?
>> More about Unicode scripts:
> Interestingly, there's issue 6331 "Add unicode script info to the
> unicode database". Looks like it didn't make it into Python 3.6.
Opened in 2009 with patch and 2 revisions for 2.x. At least the Python
code needs to be updated.
Approved in principle by Martin, then unicodedata curator, but no longer
active. Neither, very much, are the other 2 listed in the Expert's index.
From what I could see, both the Python API (there is no doc patch yet)
and internal implementation need more work. If I were to get involved,
I would look at the APIs of PyICU (see Eryk Sun's post) and the
unicodescript module on PyPI (mention by Pander Musubi, on the issue).
Terry Jan Reedy
More information about the Python-list