Unicode script

Terry Reedy tjreedy at udel.edu
Thu Dec 15 16:57:24 EST 2016

On 12/15/2016 1:06 PM, MRAB wrote:
> On 2016-12-15 16:53, Steve D'Aprano wrote:
>> Suppose I have a Unicode character, and I want to determine the script or
>> scripts it belongs to.
>> For example:
>> U+0033 DIGIT THREE "3" belongs to the script "COMMON";
>> U+0061 LATIN SMALL LETTER A "a" belongs to the script "LATIN";
>> U+03BE GREEK SMALL LETTER XI "ξ" belongs to the script "GREEK".
>> Is this information available from Python?
>> More about Unicode scripts:
>> http://www.unicode.org/reports/tr24/
>> http://www.unicode.org/Public/UCD/latest/ucd/Scripts.txt
>> http://www.unicode.org/Public/UCD/latest/ucd/ScriptExtensions.txt
> Interestingly, there's issue 6331 "Add unicode script info to the
> unicode database". Looks like it didn't make it into Python 3.6.

Opened in 2009 with patch and 2 revisions for 2.x.  At least the Python 
code needs to be updated.

Approved in principle by Martin, then unicodedata curator, but no longer 
active.  Neither, very much, are the other 2 listed in the Expert's index.

 From what I could see, both the Python API (there is no doc patch yet) 
and internal implementation need more work.  If I were to get involved, 
I would look at the APIs of PyICU (see Eryk Sun's post) and the 
unicodescript module on PyPI (mention by Pander Musubi, on the issue).

Terry Jan Reedy

More information about the Python-list mailing list