Re: [Python-Dev] [Python-checkins] r83893 - python/branches/release27-maint/Misc/ACKS
On Tue, Aug 10, 2010 at 2:10 AM, alexander.belopolsky
+PS: In the standard Python distribution, this file is encoded in UTF-8 +and the list is in rough alphabetical order by last names.
David Abrahams Jim Ahlstrom @@ -28,6 +29,7 @@ Éric Araujo Jason Asbahr David Ascher +Peter Åstrand
From my recollection of the discussion when Peter was added, the first character in his last name actually sorts after Z (despite its resemblance to an A).
Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
2010/8/9 Nick Coghlan
On Tue, Aug 10, 2010 at 2:10 AM, alexander.belopolsky
wrote: +PS: In the standard Python distribution, this file is encoded in UTF-8 +and the list is in rough alphabetical order by last names.
David Abrahams Jim Ahlstrom @@ -28,6 +29,7 @@ Éric Araujo Jason Asbahr David Ascher +Peter Åstrand
From my recollection of the discussion when Peter was added, the first character in his last name actually sorts after Z (despite its resemblance to an A).
This is correct. Don't think of Å as a kind of "A". It's its own letter, which sorts after Z in Swedish. -- Regards, Benjamin
Was it on IRC? I do remember discussion, but forgot the answer. :(
Do you agree that ACKS should be the same in the active branches? I'll
fix the order when I merge the lists.
On Aug 9, 2010, at 10:53 PM, Benjamin Peterson
2010/8/9 Nick Coghlan
: On Tue, Aug 10, 2010 at 2:10 AM, alexander.belopolsky
wrote: +PS: In the standard Python distribution, this file is encoded in UTF-8 +and the list is in rough alphabetical order by last names.
David Abrahams Jim Ahlstrom @@ -28,6 +29,7 @@ Éric Araujo Jason Asbahr David Ascher +Peter Åstrand
From my recollection of the discussion when Peter was added, the first character in his last name actually sorts after Z (despite its resemblance to an A).
This is correct. Don't think of Å as a kind of "A". It's its own letter, which sorts after Z in Swedish.
-- Regards, Benjamin _______________________________________________ Python-checkins mailing list Python-checkins@python.org http://mail.python.org/mailman/listinfo/python-checkins
On Tue, Aug 10, 2010 at 1:24 PM, Alexander Belopolsky
Was it on IRC? I do remember discussion, but forgot the answer. :(
python-dev or python-checkins I think, but I don't really remember. (Not IRC though, as I only very rarely drop in on the channel)
Do you agree that ACKS should be the same in the active branches? I'll fix the order when I merge the lists.
The most important one to keep up to date is the one for the main development branch, since that should be a superset of all the others. The maintenance branches will naturally be missing new contributors (aside from those contributing bug fixes for that branch), and that's OK. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Mon, Aug 9, 2010 at 11:32 PM, Nick Coghlan
On Tue, Aug 10, 2010 at 1:24 PM, Alexander Belopolsky
wrote: Was it on IRC? I do remember discussion, but forgot the answer. :(
python-dev or python-checkins I think, but I don't really remember. (Not IRC though, as I only very rarely drop in on the channel)
I'll search the archives. My reasoning was that Å in Åstrand was the same as Å in Ångström. Webster's dictionary (1992 edition that was on my bookshelf) has Ångström (Anders Jonas) between angstrom (the unit) and Anguilla (an island).
Do you agree that ACKS should be the same in the active branches? I'll fix the order when I merge the lists.
The most important one to keep up to date is the one for the main development branch, since that should be a superset of all the others. The maintenance branches will naturally be missing new contributors (aside from those contributing bug fixes for that branch), and that's OK.
As I mentioned in a tracker comment, it may be useful to sync the lists between the main and development branches to avoid svnmerge conflicts. I think I've seen some names missing from the main branch which exist in maintenance ones.
Am 10.08.2010 04:47, schrieb Nick Coghlan:
On Tue, Aug 10, 2010 at 2:10 AM, alexander.belopolsky
wrote: +PS: In the standard Python distribution, this file is encoded in UTF-8 +and the list is in rough alphabetical order by last names.
David Abrahams Jim Ahlstrom @@ -28,6 +29,7 @@ Éric Araujo Jason Asbahr David Ascher +Peter Åstrand
From my recollection of the discussion when Peter was added, the first character in his last name actually sorts after Z (despite its resemblance to an A).
That's why it says "rough" alphabetical order. Putting it into either place sounds reasonable - we can just expect people to change it forth and back. Regards, Martin
Am 10.08.2010 05:49, schrieb Alexander Belopolsky:
On Mon, Aug 9, 2010 at 11:32 PM, Nick Coghlan
wrote: On Tue, Aug 10, 2010 at 1:24 PM, Alexander Belopolsky
wrote: Was it on IRC? I do remember discussion, but forgot the answer. :(
python-dev or python-checkins I think, but I don't really remember. (Not IRC though, as I only very rarely drop in on the channel)
I'll search the archives. My reasoning was that Å in Åstrand was the same as Å in Ångström. Webster's dictionary (1992 edition that was on my bookshelf) has Ångström (Anders Jonas) between angstrom (the unit) and Anguilla (an island).
People need to recognize that any kind of reference is really irrelevant here. There is no "right" order that is better than any other "right" order. I'd personally object to any English language dictionary telling me how my name sorts in the alphabet. (and yes, I do think it's "wrong" that it got sorted after Lyngvig - in Germany, we put the ö as if it was "oe" - unlike the Swedes, which put the very same letter after the rest of the alphabet. So the ö in Chrigström sorts in a different way than the ö in Löwis. If I move to Sweden, the file would have to change :-) Regards, Martin
Benjamin Peterson writes:
2010/8/9 Nick Coghlan
: On Tue, Aug 10, 2010 at 2:10 AM, alexander.belopolsky
wrote: +PS: In the standard Python distribution, this file is encoded in UTF-8 +and the list is in rough alphabetical order by last names.
David Abrahams Jim Ahlstrom @@ -28,6 +29,7 @@ Éric Araujo Jason Asbahr David Ascher +Peter Åstrand From my recollection of the discussion when Peter was added, the first character in his last name actually sorts after Z (despite its resemblance to an A). This is correct. Don't think of Å as a kind of "A". It's its own letter, which sorts after Z in Swedish.
That's true, but IIRC there are a fairly large number of letters where different languages collate them in different positions. Is it worth actually asking appropriate humans to think about this, or would it be better to use Unicode code point order for simplicity?
2010/8/10 Stephen J. Turnbull
Benjamin Peterson writes: > 2010/8/9 Nick Coghlan
: > > On Tue, Aug 10, 2010 at 2:10 AM, alexander.belopolsky > > wrote: > >> +PS: In the standard Python distribution, this file is encoded > >> in UTF-8 +and the list is in rough alphabetical order by last > >> names. > >> > >> David Abrahams > >> Jim Ahlstrom > >> @@ -28,6 +29,7 @@ > >> Éric Araujo > >> Jason Asbahr > >> David Ascher > >> +Peter Åstrand > > From my recollection of the discussion when Peter was added, the > > >first > > character in his last name actually sorts after Z (despite its > > resemblance to an A). > This is correct. Don't think of Å as a kind of "A". It's its own > letter, which sorts after Z in Swedish. That's true, but IIRC there are a fairly large number of letters where different languages collate them in different positions.
Is it worth actually asking appropriate humans to think about this, or would it be better to use Unicode code point order for simplicity?
I think it's largely a unimportant discussion. If people have an opinion of where their name should appear, they can by all means change it. However, "rough" is probably as best as it'll ever get. -- Regards, Benjamin
On Tue, Aug 10, 2010 at 1:53 AM, "Martin v. Löwis"
People need to recognize that any kind of reference is really irrelevant here. There is no "right" order that is better than any other "right" order. I'd personally object to any English language dictionary telling me how my name sorts in the alphabet.
Even when an English language dictionary follows German rules? :-) BTW, I did quietly bring Peter Åstrand back to the end of the list yesterday and I agree that this is rather unimportant.
(and yes, I do think it's "wrong" that it got sorted after Lyngvig - in Germany, we put the ö as if it was "oe" - unlike the Swedes, which put the very same letter after the rest of the alphabet. So the ö in Chrigström sorts in a different way than the ö in Löwis. If I move to Sweden, the file would have to change :-)
I did search the mail archives for the discussion of Å's sorting order and now I think that the reference to Swedish rules is an ex-post rationalization. It looks like the original order was by Latin-1 code point and that explains both Å and ö positions. (I actually believe that the Swedish rules are fairly modern as well. Unlike other nations, Swedes don't mind breaking with traditions for modern conveniences. As far as I know, Sweden is the only nation where polite "you" (plural) was abolished by a language reform.) I raised this issue after one of my early check-ins got a response that acknowledgments should be alphabetized rather than added at the end of the list. [1] I pointed out that given that the file is encoded in UTF-8, it can potentially have last names starting with any unicode character and I was not familiar with any formal procedure that would define an alphabetic order in this case. A short brainstorming session on IRC and the tracker resulted with an agreement that no formal rule exists and the best we can do is to define the order as "rough". I am not 100% happy with this because I am sure people will keep discovering that the order in the file does not match the order suggested by their favorite sort program. I was also hoping to learn from this discussion what the state of the art in in sorting unicode words is. I believe this issue is addressed by some obscure parts of the unicode standard, but I am not familiar with them. [1] http://mail.python.org/pipermail/python-checkins/2010-May/093650.html
On 8/10/2010 9:13 AM, Benjamin Peterson wrote:
2010/8/10 Stephen J. Turnbull
: Benjamin Peterson writes:
2010/8/9 Nick Coghlan
: On Tue, Aug 10, 2010 at 2:10 AM, alexander.belopolsky
wrote: +PS: In the standard Python distribution, this file is encoded in UTF-8 +and the list is in rough alphabetical order by last names.
David Abrahams Jim Ahlstrom @@ -28,6 +29,7 @@ Éric Araujo Jason Asbahr David Ascher +Peter Åstrand From my recollection of the discussion when Peter was added, the first character in his last name actually sorts after Z (despite its resemblance to an A). This is correct. Don't think of Å as a kind of "A". It's its own letter, which sorts after Z in Swedish.
That's true, but IIRC there are a fairly large number of letters where different languages collate them in different positions.
Is it worth actually asking appropriate humans to think about this, or would it be better to use Unicode code point order for simplicity?
I think it's largely a unimportant discussion. If people have an opinion of where their name should appear, they can by all means change it. However, "rough" is probably as best as it'll ever get.
If I were committing a patch and was checking to see whether a name that started with a decorated A (or any other letter) were already in the list, I would look in the appropriate place in the A (or other) section, not after Z. Everyone working on the English-based Python distribution knows the order of the 26 English letters. Please use that order (including for decorated versions and tranliterations) instead of various idiosyncratic and possibly conflicting nationality-based rules. For instance, suppose a 'Jean Charbol' posts a patch? Should we really have to ask his/her 'nationality' before adding the name to the list? Suppose 'Charbol' was born in Spain but works in France? In Spain, at least, 'ch' words are alphabetized in dictionaries between 'c' and 'd' words. Did everyone already know that? I an mot ever sure if all Spanish-speaking countries still do that. I am under the impression that either the Irish or Scots have some fussy rules for Mc/Mac/O names but I don't know them and don't think we should observe them in our list. Librarians who filed author cards by birth nationality rules made the now-obsolete card catalogs less useful for users who not know both birth nationality and rule. Lets not repeat that mistake. -- Terry Jan Reedy
On 8/10/2010 3:25 PM, Terry Reedy wrote:
Everyone working on the English-based Python distribution knows the order of the 26 English letters. Please use that order (including for decorated versions and tranliterations) instead of various idiosyncratic and possibly conflicting nationality-based rules.
Since the list is now utf-8 instead of latin-1 encoded, we could include the actual native character name, if supplied, in parentheses after the English-alphabetized transliteration. If we were to follow native rules, all Japanese names, for instance, should be separately listed and ordered according to the Japanese order, which is quite different from the European orders. -- Terry Jan Reedy
2010/8/10 Terry Reedy
On 8/10/2010 9:13 AM, Benjamin Peterson wrote:
2010/8/10 Stephen J. Turnbull
: Benjamin Peterson writes: > 2010/8/9 Nick Coghlan
: > > On Tue, Aug 10, 2010 at 2:10 AM, alexander.belopolsky > > wrote: > >> +PS: In the standard Python distribution, this file is encoded > >> in UTF-8 +and the list is in rough alphabetical order by last > >> names. > >> > >> David Abrahams > >> Jim Ahlstrom > >> @@ -28,6 +29,7 @@ > >> Éric Araujo > >> Jason Asbahr > >> David Ascher > >> +Peter Åstrand > > From my recollection of the discussion when Peter was added, the > > >first > > character in his last name actually sorts after Z (despite its > > resemblance to an A). > This is correct. Don't think of Å as a kind of "A". It's its own > letter, which sorts after Z in Swedish. That's true, but IIRC there are a fairly large number of letters where different languages collate them in different positions.
Is it worth actually asking appropriate humans to think about this, or would it be better to use Unicode code point order for simplicity?
I think it's largely a unimportant discussion. If people have an opinion of where their name should appear, they can by all means change it. However, "rough" is probably as best as it'll ever get.
If I were committing a patch and was checking to see whether a name that started with a decorated A (or any other letter) were already in the list, I would look in the appropriate place in the A (or other) section, not after Z.
Everyone working on the English-based Python distribution knows the order of the 26 English letters. Please use that order (including for decorated versions and tranliterations) instead of various idiosyncratic and possibly conflicting nationality-based rules.
For instance, suppose a 'Jean Charbol' posts a patch? Should we really have to ask his/her 'nationality' before adding the name to the list?
No, but if he complains about it, we should change it.
Librarians who filed author cards by birth nationality rules made the now-obsolete card catalogs less useful for users who not know both birth nationality and rule. Lets not repeat that mistake.
How often are people trying to search through Misc/ACKS, though? -- Regards, Benjamin
On Tue, Aug 10, 2010 at 3:25 PM, Terry Reedy
If I were committing a patch and was checking to see whether a name that started with a decorated A (or any other letter) were already in the list, I would look in the appropriate place in the A (or other) section, not after Z.
Everyone working on the English-based Python distribution knows the order of the 26 English letters. Please use that order (including for decorated versions and tranliterations) instead of various idiosyncratic and possibly conflicting nationality-based rules.
I believe, the golden standard for this type of works can be found in the index pages of The Art of Computer Programming, http://www-cs-faculty.stanford.edu/~knuth/help.html#exotic It would be quite an effort to redo Misc/ACKS in that way, and even with ASCII transliteration of every name, there is still ambiguity: is "Van Rossum" sorted under "V", or under "R"? (See http://www.python.org/~guido/ for an answer.) Since it is apparent that no formal rule can be agreed upon, I think best effort "rough alphabetical" order is just fine. BTW, what is Arfrever Frehtes Taifersar Arahesis' last name? :-)
On 8/10/2010 3:44 PM, Benjamin Peterson wrote:
No, but if he complains about it, we should change it.
If "In rough English alphabetical order" is extended with "unless the person requests otherwise", then it should also be extended with "in which case the name is suffixed with '(phbr)' [or something similar] for 'put here by request'" so that a later, diligent person seeking to improve the ordering will not think that it out of standard order by accident or initial committer laziness and move it back. I believe we are having this discussion in part precisedly because Astrand after Z was not so tagged and was thought to have just been quickly appended. -- Terry Jan Reedy
I am not 100% happy with this because I am sure people will keep discovering that the order in the file does not match the order suggested by their favorite sort program. I was also hoping to learn from this discussion what the state of the art in in sorting unicode words is. I believe this issue is addressed by some obscure parts of the unicode standard, but I am not familiar with them.
Actually, it's not. Rather, Unicode acknowledges that collation depends on the locale, see http://unicode.org/reports/tr10/ Of course, it would be possible to follow the Default Unicode Collation Element Table (DUCET). Regards, Martin
If I were committing a patch and was checking to see whether a name that started with a decorated A (or any other letter) were already in the list, I would look in the appropriate place in the A (or other) section, not after Z.
Everyone working on the English-based Python distribution knows the order of the 26 English letters. Please use that order (including for decorated versions and tranliterations) instead of various idiosyncratic and possibly conflicting nationality-based rules.
So where do you put Γεώργιος Μπουτσιούκης? Regards, Martin
On Wed, Aug 11, 2010 at 12:35 AM, Alexander Belopolsky
On Tue, Aug 10, 2010 at 6:29 PM, "Martin v. Löwis"
wrote: .. So where do you put Γεώργιος Μπουτσιούκης?
or Александр Белопольский for that matter? :-)
James Tauber did a UCA implementation in Python it seems: http://jtauber.com/blog/2006/01/27/python_unicode_collation_algorithm/, we could use this as a pre-commit hook to check changes on ACKS ;)
Am 11.08.2010 00:35, schrieb Alexander Belopolsky:
On Tue, Aug 10, 2010 at 6:29 PM, "Martin v. Löwis"
wrote: .. So where do you put Γεώργιος Μπουτσιούκης?
or Александр Белопольский for that matter? :-)
If you care about that, feel free to add that spelling to the file. Somebody proposed to put it along with some latin transliteration, which I can sympathize with. If just the nickname in cyrillic is fine with you, it's of course fine, as well. Regards, Martin
On Tue, Aug 10, 2010 at 6:50 PM, "Martin v. Löwis"
or Александр Белопольский for that matter? :-)
If you care about that, feel free to add that spelling to the file. Somebody proposed to put it along with some latin transliteration, which I can sympathize with.
That was Donald Knuth: http://www-cs-faculty.stanford.edu/~knuth/help.html#exotic
If just the nickname in cyrillic is fine with you, it's of course fine, as well.
I am more than happy with my entry in its current form. :-) BTW, does anybody know if Jiba = Jean-Baptiste LAMY ("Jiba")? CCing SF address to find out.
On 8/10/2010 6:29 PM, "Martin v. Löwis" wrote:
If I were committing a patch and was checking to see whether a name that started with a decorated A (or any other letter) were already in the list, I would look in the appropriate place in the A (or other) section, not after Z.
Everyone working on the English-based Python distribution knows the order of the 26 English letters. Please use that order (including for decorated versions and tranliterations) instead of various idiosyncratic and possibly conflicting nationality-based rules.
So where do you put Γεώργιος Μπουτσιούκης?
As I said above, where the transliterated version Geor.. goes, with the tranliteration followed by '(Γεώργιος Μπουτσιούκης)' as I suggested elsewhere -- Terry Jan Reedy
On Tue, 10 Aug 2010 15:25:52 -0400
Terry Reedy
Everyone working on the English-based Python distribution knows the order of the 26 English letters.
How does that solve anything? I just had to decide whether “Jason V. Miller” had to come before or after “Jay T. Miller” ('Jason' < 'Jay' but 'V' > 'T'). Knowledge of the “English” alphabet isn't enough to make a resolution: an idiosyncratic rule is still needed. (and before you claim that rule is well-known: I had to ask) Regards Antoine.
participants (8)
-
"Martin v. Löwis"
-
Alexander Belopolsky
-
Antoine Pitrou
-
Benjamin Peterson
-
Nick Coghlan
-
Stephen J. Turnbull
-
Tarek Ziadé
-
Terry Reedy