Sorting strings containing special characters (german 'Umlaute')
Peter Otten
__peter__ at web.de
Fri Mar 2 09:25:49 EST 2007
DierkErdmann at mail.com wrote:
> I know that this topic has been discussed in the past, but I could not
> find a working solution for my problem: sorting (lists of) strings
> containing special characters like "ä", "ü",... (german umlaute).
> Consider the following list:
> l = ["Aber", "Beere", "Ärger"]
>
> For sorting the letter "Ä" is supposed to be treated like "Ae",
I don't think so:
>>> sorted(["Ast", "Ärger", "Ara"], locale.strcoll)
['Ara', '\xc3\x84rger', 'Ast']
>>> sorted(["Ast", "Aerger", "Ara"])
['Aerger', 'Ara', 'Ast']
> therefore sorting this list should yield
> l = ["Aber, "Ärger", "Beere"]
>
> I know about the module locale and its method strcoll(string1,
> string2), but currently this does not work correctly for me. Consider
> >>> locale.strcoll("Ärger", "Beere")
> 1
>
> Therefore "Ärger" ist sorted after "Beere", which is not correct IMO.
> Can someone help?
>
> Btw: I'm using WinXP (german) and
>>>> locale.getdefaultlocale()
> prints
> ('de_DE', 'cp1252')
The default locale is not used by default; you have to set it explicitly
>>> import locale
>>> locale.strcoll("Ärger", "Beere")
1
>>> locale.setlocale(locale.LC_ALL, "")
'de_DE.UTF-8'
>>> locale.strcoll("Ärger", "Beere")
-1
By the way, you will avoid a lot of "Ärger"* if you use unicode right from
the start.
Finally, for efficient sorting, a key function is preferable over a cmp
function:
>>> sorted(["Ast", "Ärger", "Ara"], key=locale.strxfrm)
['Ara', '\xc3\x84rger', 'Ast']
Peter
(*) German for "trouble"
More information about the Python-list
mailing list