length of unicode strings
mhammond at skippinet.com.au
Thu Aug 22 04:22:13 CEST 2002
Trond Eivind Glomsrød wrote:
> When running on a utf-8 system, python doesn't seem to take it input
> in unicode:
> Python 2.2.1 (#1, Aug 19 2002, 18:04:04)
> [GCC 3.2 (Red Hat Linux Rawhide 3.2-1)] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
:( unicode is hard. I won't pretend to understand, but as no other
replies exist this may be useful.
Here we do indeed seem to have a UTF8 representation of the character.
>>> len(unicode('\xc3\xa5', "utf8"))
What we see here is, effectively,
ie, we are creating a unicode string from a 2 character ascii string.
I'm really not sure what the semantics of the default encoding are here,
but I would expect it to work if you changed the default encoding in site.py
That isnt generally a good idea tho - but as I don't really understand
how everything interacts in this case, I wont speculate nor advise :)
> Any particular things to configure? Enabling the
> locale.getdefaultlocale() part in site.py doesn't help :(
At the end of the day, it seem the character you want is \xe5, and, if
decoded properly, the len() function works correctly. eg:
More information about the Python-list