Unicode script
eryk sun
eryksun at gmail.com
Thu Dec 15 13:01:58 EST 2016
On Thu, Dec 15, 2016 at 4:53 PM, Steve D'Aprano
<steve+python at pearwood.info> wrote:
> Suppose I have a Unicode character, and I want to determine the script or
> scripts it belongs to.
>
> For example:
>
> U+0033 DIGIT THREE "3" belongs to the script "COMMON";
> U+0061 LATIN SMALL LETTER A "a" belongs to the script "LATIN";
> U+03BE GREEK SMALL LETTER XI "ΞΎ" belongs to the script "GREEK".
>
> Is this information available from Python?
Tools/makunicodedata.py doesn't include data from "Scripts.txt". If
adding an external dependency is ok, then you can use PyICU. For
example:
>>> icu.Script.getScript('\u0033').getName()
'Common'
>>> icu.Script.getScript('\u0061').getName()
'Latin'
>>> icu.Script.getScript('\u03be').getName()
'Greek'
There isn't documentation specific to Python, so you'll have to figure
things out experimentally with reference to the C API.
http://icu-project.org/apiref/icu4c
http://icu-project.org/apiref/icu4c/uscript_8h.html
More information about the Python-list
mailing list