[Tutor] string.uppercase: too many for locale

Thu Jan 11 01:28:46 CET 2007

Barnaby Scott wrote:
> Can anyone explain the following: I was getting string.uppercase 
> returning an unexpected number of characters, given that the Python Help 
> says that it should normally be A-Z. Being locale-dependent, I checked 
> that my locale was not set to something exotic, and sure enough it is 
> only what I expected - see below:
> 
> 
> IDLE 1.1      ==== No Subprocess ====
>  >>> import locale, string
>  >>> locale.getlocale()
> ['English_United Kingdom', '1252']
>  >>> print string.uppercase
> ABCDEFGHIJKLMNOPQRSTUVWXYZŠŒŽŸÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞ
>  >>> print string.lowercase
> abcdefghijklmnopqrstuvwxyzƒšœžßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿ
>  >>>
> 
> What am I missing here? Surely for UK English, I really should just be 
> getting A-Z and a-z. In case it is relevant, the platform is Windows 2000.

Interesting. Here is what I get:
 >>> import locale, string
 >>> locale.getlocale()
(None, None)
 >>> string.uppercase
'ABCDEFGHIJKLMNOPQRSTUVWXYZ'

Somehow the locale for your system has changed from the 'C' locale. If I 
set the default locale I get similar results to yours:
 >>> locale.setlocale(locale.LC_ALL, '')
'English_United States.1252'
 >>> locale.getlocale()
('English_United States', '1252')
 >>> print string.uppercase
ABCDEFGHIJKLMNOPQRSTUVWXYZèîÄƒ└┴┬├─┼╞╟╚╔╩╦╠═╬╧╨╤╥╙╘╒╓╪┘┌█▄▌▐

which doesn't print correctly because my console encoding is actually 
cp437 not cp1252.

It looks like string.uppercase is giving you all the characters which 
are uppercase in the current encoding, which seems reasonable. You can 
use string.ascii_uppercase if you want just A-Z.

Kent