This post caused me to notice the following behavior. Is it "right"?
import locale locale.setlocale(locale.LC_CTYPE, "tr_TR") 'tr_TR' locale.getlocale()[1] # Expected charset 'ISO8859-9' "I".lower() # Expected behavior '\xfd' u"I".lower() # Python bug? (should be u'\u0131') u'i'
Why? Unicode strings are not affected by the locale.
locale.setlocale(locale.LC_CTYPE, "tr_TR.UTF-8") 'tr_TR.UTF-8' "I".lower() # C library bug? (should be "\xc4\xb1")* 'I' locale.setlocale(locale.LC_CTYPE, "en_US.UTF-8") 'en_US.UTF-8' "I".lower() # (UTF-8 locale works properly in english) 'i'
I have no idea what adding UTF8 to the local means. Is this something that Python's locale-awareness does or is it simply recognized by the C library? --Guido van Rossum (home page: http://www.python.org/~guido/)