Proposal: require 7-bit source str's

"Martin v. Löwis" martin at v.loewis.de
Sun Aug 22 16:53:28 EDT 2004


Hallvard B Furuseth wrote:
>>>For example, if one uses character set ns_4551-1 - ASCII with {|}[\]
>>>replaced with æøåÆØÅ, sorting by simple byte ordering will sort text
>>>correctly.  Unicode text _can't_ be sorted correctly, because of
>>>characters like 'ö': Swedish 'ö' should match Norwegian 'ø' and sort
>>>with that, while German 'ö' should not match 'ø' and sorts with 'o'.
>>
>>Why not sort depending on the locale instead of ordinal values of the
>>bytes/characters?
> 
> 
> I'm in Norway.  Both Swedes and Germans are foreigners.

I agree with many things you said, but this example is bogus. If I
(as a German) use ns_4551-1, sorting is simple - and incorrect, because,
as you say, ö sorts with o in my language - yet the simple sorting of
ns_4551-1 doesn't. So sorting is *not* simple with ns_4551-1.

Likewise, sorting *is* possible with Unicode if you take the locale into
account. The order of character doesn't have to be the numerical one,
and, as you explain, it might even depend on the locale. So if you
want a Swedish collaction, use a Swedish locale; if you want a German
collation, use a German locale.

Regards,
Martin



More information about the Python-list mailing list