Guido van Rossum, 01.09.2011 18:31:
On Thu, Sep 1, 2011 at 9:03 AM, Antoine Pitrou wrote:
Le jeudi 01 septembre 2011 à 08:45 -0700, Guido van Rossum a écrit :
This is definitely thought of as a separate mark added to the e; ë is not a new letter. I have a feeling it's the same way for the French and Germans, but I really don't know. (Antoine? Georg?)
Indeed, they are not separate "letters" (they are considered the same in lexicographic order, and the French alphabet has 26 letters).
So does the German alphabet, even though that does not include "ß", which basically descended from a ligature of the old German way of writing "sz", where "s" looked similar to an "f" and "z" had a low hanging tail. IIRC, German Umlaut letters are lexicographically sorted according to their emergency replacement spelling ("ä" -> "ae"), which is also sometimes used in all upper case words ("Glück" -> "GLUECK"). I guess that's because Umlaut dots are harder to see on top of upper case letters. So, Latin-1 byte value sorting always yields totally wrong results. That aside, Umlaut letters are commonly considered separate letters, different from the undotted letters and also different from the replacement spellings. I, for one, always found the replacements rather weird and never got used to using them in upper case words. In any case, it's wrong to always use them, and it makes text harder to read.
But I'm not sure how it's relevant, because you can't remove an accent without most likely making a spelling error, or at least changing the meaning. Accents are very much part of the language (while ligatures like "ff" are not, they are a rendering detail). So I would consider "é", "ê", "ù", etc. atomic characters for the purpose of processing French text. And I don't see how a decomposed form could help an application.
I recall long ago that when the french wrote words in all caps they would drop the accents, e.g. ECOLE. I even recall (through the mists of time) observing this in Paris on public signs. Is this still the convention?
Yes, and it's a huge problem when trying to pronounce last names. In French, you'd commonly write LASTNAME, Firstname and if LASTNAME happens to have accented letters, you'd miss them when reading that. I know a couple of French people who severely suffer from this, because the pronunciation of their name gets a totally different meaning without accents. Stefan