[ python-Bugs-1519069 ] incorrect locale.strcoll() return in Windows

SourceForge.net noreply at sourceforge.net
Sat Jul 15 09:37:09 CEST 2006


Bugs item #1519069, was opened at 2006-07-08 05:04
Message generated for change (Comment added) made by loewis
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1519069&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Windows
Group: Python 2.4
Status: Open
Resolution: None
Priority: 5
Submitted By: Brian Matherly (pez4brian)
Assigned to: Nobody/Anonymous (nobody)
Summary: incorrect locale.strcoll() return in Windows

Initial Comment:
Python 2.4.2 in Windows (English locale):

>>> import locale
>>> locale.setlocale(locale.LC_ALL,'C')
'C'
>>> locale.setlocale(locale.LC_ALL,'')
'English_United States.1252'
>>> locale.strcoll("M","m")
1
>>> locale.strcoll("Ma","mz")
-1

It appears that when a string has one character, "M" is
greater than "m", but when it has more than one string,
"M" is equal to "m"

----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2006-07-15 09:37

Message:
Logged In: YES 
user_id=21627

You should ask these questions in some Win32 programmer
newsgroup. I don't know whether this sorting is correct or
not, I'm not a native English speaker.

----------------------------------------------------------------------

Comment By: Brian Matherly (pez4brian)
Date: 2006-07-15 05:14

Message:
Logged In: YES 
user_id=726294

Thanks for your response. That is simply unacceptable. Who
at Microsoft needs to be flogged? More likely, this shows my
lack of understanding of strings and locale in general.

Your explanation does explain the results I get, but
wouldn't you admit that the results *seem* wrong?

By the definition given, the strings "Ma", "mb", "Mc", "md"
would actually sort in that order! So the list of sorted
strings would have alternating capitalization!

However, the list of strings "M", "m", "M", "m" would sort
as "M", "M", "m", "m" - no alternating capitalization - as I
would expect.

Would there happen to be some way to sort the strings using
the locale, but also using the case earlier in the
computation order? Basically, I want the sort to be case
sensitive.

Thanks again for your response. If you have any suggestions
that might help me achieve what I want, it would be greatly
appreciated.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2006-07-14 17:55

Message:
Logged In: YES 
user_id=21627

Why do you think this is a bug? We pass the string as-is to
the C library, which passes it nearly as-is to
CompareStringW. This function then decides how they collate;
in Microsoft's definition of the English_United States
locale, these strings do have the order you get.

In case you wonder how the order is computed: essentially,
the strings are compared case insensitive, without
diacritics. If they then compare equal, the diacritics are
considered. If this still compares equal, Case weights are
considered. If this still compares equal, Special weights
are considered.

(Note: I obtained this indirectly by looking at the
LCMapString documentation, assuming that CompareString uses
LCMapString with LCMAP_SORTKEY|SORT_STRINGSORT).

----------------------------------------------------------------------

Comment By: Brian Matherly (pez4brian)
Date: 2006-07-08 05:35

Message:
Logged In: YES 
user_id=726294

I see the same problem in python 2.4.3

----------------------------------------------------------------------

Comment By: Brian Matherly (pez4brian)
Date: 2006-07-08 05:08

Message:
Logged In: YES 
user_id=726294

Correction:

It appears that when a string has one character, "M" is
greater than "m", but when it has more than one character,
"M" is equal to "m"

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1519069&group_id=5470


More information about the Python-bugs-list mailing list