[Tutor] string.uppercase: too many for locale

Barnaby Scott bds at waywood.co.uk
Thu Jan 11 10:27:50 CET 2007


Kent Johnson wrote:
> Barnaby Scott wrote:
>> Can anyone explain the following: I was getting string.uppercase 
>> returning an unexpected number of characters, given that the Python 
>> Help says that it should normally be A-Z. Being locale-dependent, I 
>> checked that my locale was not set to something exotic, and sure 
>> enough it is only what I expected - see below:
>>
>>
>> IDLE 1.1      ==== No Subprocess ====
>>  >>> import locale, string
>>  >>> locale.getlocale()
>> ['English_United Kingdom', '1252']
>>  >>> print string.uppercase
>> ABCDEFGHIJKLMNOPQRSTUVWXYZŠŒŽŸÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞ
>>  >>> print string.lowercase
>> abcdefghijklmnopqrstuvwxyzƒšœžßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿ
>>  >>>
>>
>> What am I missing here? Surely for UK English, I really should just be 
>> getting A-Z and a-z. In case it is relevant, the platform is Windows 
>> 2000.
> 
> Interesting. Here is what I get:
>  >>> import locale, string
>  >>> locale.getlocale()
> (None, None)
>  >>> string.uppercase
> 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
> 
> Somehow the locale for your system has changed from the 'C' locale. If I 
> set the default locale I get similar results to yours:
>  >>> locale.setlocale(locale.LC_ALL, '')
> 'English_United States.1252'
>  >>> locale.getlocale()
> ('English_United States', '1252')
>  >>> print string.uppercase
> ABCDEFGHIJKLMNOPQRSTUVWXYZèîă└┴┬├─┼╞╟╚╔╩╦╠═╬╧╨╤╥╙╘╒╓╪┘┌█▄▌▐
> 
> which doesn't print correctly because my console encoding is actually 
> cp437 not cp1252.
> 
> It looks like string.uppercase is giving you all the characters which 
> are uppercase in the current encoding, which seems reasonable. You can 
> use string.ascii_uppercase if you want just A-Z.
> 
> Kent
> 
Thanks, but this raises various questions:

Why would my locale have 'changed' - and from what?
What *would* be the appropriate locale given that I am in the UK and use
English, and how would I set it?
Why on earth does the ['English_United Kingdom', '1252'] locale setting
consider ŠŒŽŸÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞ to be appropriate?
Is this less to do with Python than the operating system?
Where can I read more on the subject?

Sorry for all the open-ended questions, but I am baffled by this and can
find no information. Sadly, just using string.ascii_uppercase is not a
solution because I am trying to develop something for different locales,
but only want the actual letters that a particular language uses to be
returned - e.g. English should be A-Z only, Swedish should be A-Z + ÅÄÖ
(only) etc. The thing I really want to avoid is having to hard-code for
every language on the planet - surely this is the whole point of locale
settings, and locale-dependent functions and constants?

Thanks

Barnaby Scott



More information about the Tutor mailing list