Unicode troubles
Michael Radziej
mir at news.m1.spieleck.de
Fri Oct 10 08:51:29 EDT 2003
Rodrigo Benenson wrote:
> Sometimes I get len(u"eló") = 3 (the good result) and other times
> len(u"eló") = 4 (wrong result). These seems indiferent of the OS.
There are different ways to express "special" characters.
E.g. you can describe "ó" as a single character,
or as accent + "o".
What you want is the "canonical form".
Take a look at unicodedata.normalize (well, it came
new with Python 2.3)
http://www.python.org/doc/current/lib/module-unicodedata.html
Hope this helps,
Michael Radziej
More information about the Python-list
mailing list