Unicode troubles

Michael Radziej mir at news.m1.spieleck.de
Fri Oct 10 08:51:29 EDT 2003


Rodrigo Benenson wrote:

> Sometimes I get len(u"eló") = 3 (the good result) and other times
> len(u"eló") = 4 (wrong result). These seems indiferent of the OS.

There are different ways to express "special" characters. 
E.g. you can describe "ó" as a single character,
or as accent + "o". 
What you want is the "canonical form".
Take a look at unicodedata.normalize (well, it came
new with Python 2.3)

http://www.python.org/doc/current/lib/module-unicodedata.html

Hope this helps,

Michael Radziej





More information about the Python-list mailing list