29 Dec
2003
29 Dec
'03
6:04 p.m.
Jeff Epler wrote:
u"I".lower() # Python bug? (should be u'\u0131')
u'i'
As Guido says: unicode.tolower is locale-inaware; it uses the Unicode Consortium character properties instead to determine the lower-case character.
"I".lower() # C library bug? (should be "\xc4\xb1")*
'I'
This is really a limitation of the C language, not of the C library. The interface is char tolower(char input); so it can only accept and return a single char. Multi-byte characters are not supported in that interface. Traditionally, for characters that cannot be converted, tolower returns its argument.
"I".lower() # (UTF-8 locale works properly in english)
'i'
This is because "i" is a single byte in UTF-8. Regards, Martin