unicode study with unicodedata module
xah at xahlee.org
Tue Mar 15 13:55:17 CET 2005
how do i get a unicode's number?
e.g. 03ba for greek lowercase kappa? (or in decimal form)
Xah Lee wrote:
> python has this nice unicodedata module that deals with unicode
> #-*- coding: utf-8 -*-
> # python
> from unicodedata import *
> # each unicode char has a unique name.
> # one can use the “lookup” func to find it
> mychar=lookup('greek cApital letter sIgma')
> # note letter case doesn't matter
> print mychar.encode('utf-8')
> m=lookup('CJK UNIFIED IDEOGRAPH-5929')
> # for some reason, case must be right here.
> print m.encode('utf-8')
> # to find a char's name, use the “name” function
> print name(u'天')
> basically, in unicode, each char has a number of attributes (called
> properties) besides its name. These attributes provides necessary
> to form letters, words, or processing such as sorting,
> etc, of varous human scripts. For example, Latin alphabets has two
> forms of upper case and lower case. Korean alphabets are stacked
> together. While many symbols corresponds to numbers, and there are
> combining forms used for example to put a bar over any letter or
> character. Also some writings systems are directional. In order to
> these symbols for display or process them for computing, info of
> on each char is necessary.
> the rest of functions in unicodedata return these attributes.
> see unicodedata doc:
> Official word on unicode character properties:
> i don't know what's the state of Perl's unicode. Is there something
> this post is archived at
> xah at xahlee.org
More information about the Python-list