Unicode script

eryk sun eryksun at gmail.com
Thu Dec 15 13:01:58 EST 2016

On Thu, Dec 15, 2016 at 4:53 PM, Steve D'Aprano
<steve+python at pearwood.info> wrote:
> Suppose I have a Unicode character, and I want to determine the script or
> scripts it belongs to.
> For example:
> U+0033 DIGIT THREE "3" belongs to the script "COMMON";
> U+0061 LATIN SMALL LETTER A "a" belongs to the script "LATIN";
> U+03BE GREEK SMALL LETTER XI "ξ" belongs to the script "GREEK".
> Is this information available from Python?

Tools/makunicodedata.py doesn't include data from "Scripts.txt". If
adding an external dependency is ok, then you can use PyICU. For

    >>> icu.Script.getScript('\u0033').getName()
    >>> icu.Script.getScript('\u0061').getName()
    >>> icu.Script.getScript('\u03be').getName()

There isn't documentation specific to Python, so you'll have to figure
things out experimentally with reference to the C API.


More information about the Python-list mailing list