[Python-Dev] Re: String module
Fredrik Lundh
fredrik@pythonware.com
Thu, 30 May 2002 11:23:35 +0200
Fran=E7ois Pinard wrote:
> This reminds me that I often miss, in the standard `ctype.h' and =
related,
> a function that would un-combine a character into its base character =
and
> its diacritic, and the complementary re-combining function.
import unicodedata
def uncombine(char):
chars =3D unicodedata.decomposition(unichr(ord(char))).split()
if not chars:
return [char]
return [unichr(int(x, 16)) for x in chars if x[0] !=3D "<"]
for char in "Fran=E7ois":
print uncombine(char)
['F']
['r']
['a']
['n']
[u'c', u'\u0327']
['o']
['i']
['s']
(to go the other way, store all uncombinations longer than one
character in a dictionary)
</F>