The PS: at the top of Misc/ACKS says: PS: In the standard Python distribution, this file is encoded in UTF-8 and the list is in rough alphabetical order by last names. However, the last 3 names in the list don't appear to be part of that alphabetical order. Is this somehow intentional, or just a mistake? Eli
Hi, On 11/11/2011 10.39, Eli Bendersky wrote:
The PS: at the top of Misc/ACKS says:
PS: In the standard Python distribution, this file is encoded in UTF-8 and the list is in rough alphabetical order by last names.
However, the last 3 names in the list don't appear to be part of that alphabetical order. Is this somehow intentional, or just a mistake?
Only the last two are out of place, and should be fixed. The 'Å' in "Peter Åstrand" sorts after 'Z'. See http://mail.python.org/pipermail/python-dev/2010-August/102961.html for a discussion about the order of Misc/ACKS. Best Regards, Ezio Melotti
Eli
Am 11.11.2011 10:56, schrieb Ezio Melotti:
Hi,
On 11/11/2011 10.39, Eli Bendersky wrote:
The PS: at the top of Misc/ACKS says:
PS: In the standard Python distribution, this file is encoded in UTF-8 and the list is in rough alphabetical order by last names.
However, the last 3 names in the list don't appear to be part of that alphabetical order. Is this somehow intentional, or just a mistake?
Only the last two are out of place, and should be fixed. The 'Å' in "Peter Åstrand" sorts after 'Z'. See http://mail.python.org/pipermail/python-dev/2010-August/102961.html for a discussion about the order of Misc/ACKS.
The key point here is that it is *rough* alphabetic order. IMO, sorting accented characters along with their unaccented versions would be fine as well, and be more practical. In general, it's not possible to provide a "correct" alphabetic order. For example, in German, 'ö' sorts after 'o', whereas in Swedish, it sorts after 'z'. In fact, in German, we have two different ways of sorting the ö: one is to treat it is a letter after o, and the other is to treat it as equivalent to oe. Regards, Martin
The key point here is that it is *rough* alphabetic order. IMO, sorting accented characters along with their unaccented versions would be fine as well, and be more practical. In general, it's not possible to provide a "correct" alphabetic order. For example, in German, 'ö' sorts after 'o', whereas in Swedish, it sorts after 'z'. In fact, in German, we have two different ways of sorting the ö: one is to treat it is a letter after o, and the other is to treat it as equivalent to oe.
This is really interesting. I guess lexical ordering of alphabet letters is a locale thing, but Misc/ACKS isn't supposed to be any special locale. It makes me wonder whether it's possible to have a contradiction in the ordering, i.e. have a set of names that just can't be sorted in any order acceptable by everyone. We can then call it "the Misc/ACKS incompleteness theorem" ;-) Eli
Eli Bendersky writes:
special locale. It makes me wonder whether it's possible to have a contradiction in the ordering, i.e. have a set of names that just can't be sorted in any order acceptable by everyone.
Yes, it is. The examples were already given in this thread. The Han-using languages also have this problem, and Japanese is nondetermistic all by itself (there are kanji names which for historical reasons are pronounced in several different ways, and therefore cannot be placed in phonetic order without additional information). The sensible thing is to just sort in Unicode code point order, I think.
On 11/11/2011 11:03 PM, Stephen J. Turnbull wrote:
The sensible thing is to just sort in Unicode code point order, I think.
I was going to suggest the official Unicode Collation Algorithm: http://unicode.org/reports/tr10/ But I peeked in the can, saw it was chock-a-block with worms, and declined to open it. /larry/
Am 12.11.2011 08:03, schrieb Stephen J. Turnbull:
Eli Bendersky writes:
special locale. It makes me wonder whether it's possible to have a contradiction in the ordering, i.e. have a set of names that just can't be sorted in any order acceptable by everyone.
Yes, it is. The examples were already given in this thread. The Han-using languages also have this problem, and Japanese is nondetermistic all by itself (there are kanji names which for historical reasons are pronounced in several different ways, and therefore cannot be placed in phonetic order without additional information).
The sensible thing is to just sort in Unicode code point order, I think.
The sensible thing is to accept that there is no solution, and to stop worrying. Georg
On 2011-11-12, at 10:24 , Georg Brandl wrote:
Am 12.11.2011 08:03, schrieb Stephen J. Turnbull:
Eli Bendersky writes:
special locale. It makes me wonder whether it's possible to have a contradiction in the ordering, i.e. have a set of names that just can't be sorted in any order acceptable by everyone.
Yes, it is. The examples were already given in this thread. The Han-using languages also have this problem, and Japanese is nondetermistic all by itself (there are kanji names which for historical reasons are pronounced in several different ways, and therefore cannot be placed in phonetic order without additional information).
The sensible thing is to just sort in Unicode code point order, I think.
The sensible thing is to accept that there is no solution, and to stop worrying. The file could use the default collation order, that way it'd be incorrectly sorted for everybody.
Xavier Morel writes:
On 2011-11-12, at 10:24 , Georg Brandl wrote:
Am 12.11.2011 08:03, schrieb Stephen J. Turnbull:
The sensible thing is to just sort in Unicode code point order, I think.
The sensible thing is to accept that there is no solution, and to stop worrying.
The file could use the default collation order, that way it'd be incorrectly sorted for everybody.
"What I tell you three times is true."
participants (8)
-
"Martin v. Löwis"
-
Barry Warsaw
-
Eli Bendersky
-
Ezio Melotti
-
Georg Brandl
-
Larry Hastings
-
Stephen J. Turnbull
-
Xavier Morel