Trouble sorting lists (unicode/locale related?)

Peter Otten __peter__ at web.de
Sun Sep 21 08:10:16 EDT 2003


Erlend Fuglum wrote:

> Hi everyone,
> 
> I'm having some trouble sorting lists. I suspect this might have
> something to do with locale settings and/or character
> encoding/unicode.
> 
> Consider the following example, text containing norwegian special
> characters æ, ø and å.
> 
>>>> liste = ["ola", "erlend", "trygve", "Ærlige anders", "Lars",
>>>> "Øksemorderen", "Åsne", "Akrobatiske Anna", "leidulf"] liste.sort()
>>>> liste
> ['Akrobatiske Anna', 'Lars', 'erlend', 'leidulf', 'ola', 'trygve',
> '\xc5sne', '\xc6rlige anders', '\xd8ksemorderen']
> 
> There are a couple of issues for me here:
> * The sorting method apparently places strings starting with uppercase
> characters before strings staring with lowercase. I would like to
> treat them them equally when sorting. OK, this could probably be fixed
> by hacking with .toupper() or something, but isn't it possible to
> achieve this in a more elegant way?
> 
> * The norwegian special characters are sorted in a wrong way.
> According to our alphabet the correct order is (...) x, y, z, æ, ø å.
> Python does it this way: (...) x, y, z, å, æ, ø ?
> 
> I would really appreciate any help and suggestions - I have been
> fiddling with this mess for quite some time now :-)

Try setting the appropriate locale first:

import locale
locale.setlocale(locale.LC_ALL, ("no", None))

Then for a case-insensitive sort:

wordlist.sort(locale.strcoll)

should do (disclaimer: all untested).

Peter






More information about the Python-list mailing list