Martin v. Loewis
martin at v.loewis.de
Mon Dec 29 13:04:56 EST 2003
Jeff Epler wrote:
>>>>u"I".lower() # Python bug? (should be u'\u0131')
As Guido says: unicode.tolower is locale-inaware;
it uses the Unicode Consortium character properties
instead to determine the lower-case character.
>>>>"I".lower() # C library bug? (should be "\xc4\xb1")*
This is really a limitation of the C language, not of
the C library. The interface is
char tolower(char input);
so it can only accept and return a single char. Multi-byte
characters are not supported in that interface.
Traditionally, for characters that cannot be converted,
tolower returns its argument.
>>>>"I".lower() # (UTF-8 locale works properly in english)
This is because "i" is a single byte in UTF-8.
More information about the Python-Dev