[Tutor] string.uppercase: too many for locale
Barnaby Scott
bds at waywood.co.uk
Thu Jan 11 10:27:50 CET 2007
Kent Johnson wrote:
> Barnaby Scott wrote:
>> Can anyone explain the following: I was getting string.uppercase
>> returning an unexpected number of characters, given that the Python
>> Help says that it should normally be A-Z. Being locale-dependent, I
>> checked that my locale was not set to something exotic, and sure
>> enough it is only what I expected - see below:
>>
>>
>> IDLE 1.1 ==== No Subprocess ====
>> >>> import locale, string
>> >>> locale.getlocale()
>> ['English_United Kingdom', '1252']
>> >>> print string.uppercase
>> ABCDEFGHIJKLMNOPQRSTUVWXYZŠŒŽŸÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞ
>> >>> print string.lowercase
>> abcdefghijklmnopqrstuvwxyzƒšœžßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿ
>> >>>
>>
>> What am I missing here? Surely for UK English, I really should just be
>> getting A-Z and a-z. In case it is relevant, the platform is Windows
>> 2000.
>
> Interesting. Here is what I get:
> >>> import locale, string
> >>> locale.getlocale()
> (None, None)
> >>> string.uppercase
> 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
>
> Somehow the locale for your system has changed from the 'C' locale. If I
> set the default locale I get similar results to yours:
> >>> locale.setlocale(locale.LC_ALL, '')
> 'English_United States.1252'
> >>> locale.getlocale()
> ('English_United States', '1252')
> >>> print string.uppercase
> ABCDEFGHIJKLMNOPQRSTUVWXYZèîă└┴┬├─┼╞╟╚╔╩╦╠═╬╧╨╤╥╙╘╒╓╪┘┌█▄▌▐
>
> which doesn't print correctly because my console encoding is actually
> cp437 not cp1252.
>
> It looks like string.uppercase is giving you all the characters which
> are uppercase in the current encoding, which seems reasonable. You can
> use string.ascii_uppercase if you want just A-Z.
>
> Kent
>
Thanks, but this raises various questions:
Why would my locale have 'changed' - and from what?
What *would* be the appropriate locale given that I am in the UK and use
English, and how would I set it?
Why on earth does the ['English_United Kingdom', '1252'] locale setting
consider ŠŒŽŸÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞ to be appropriate?
Is this less to do with Python than the operating system?
Where can I read more on the subject?
Sorry for all the open-ended questions, but I am baffled by this and can
find no information. Sadly, just using string.ascii_uppercase is not a
solution because I am trying to develop something for different locales,
but only want the actual letters that a particular language uses to be
returned - e.g. English should be A-Z only, Swedish should be A-Z + ÅÄÖ
(only) etc. The thing I really want to avoid is having to hard-code for
every language on the planet - surely this is the whole point of locale
settings, and locale-dependent functions and constants?
Thanks
Barnaby Scott
More information about the Tutor
mailing list