Glyphs and graphemes [was Re: Cult-like behaviour]
steve+comp.lang.python at pearwood.info
Mon Jul 16 21:08:02 EDT 2018
On Tue, 17 Jul 2018 06:15:25 +1000, Chris Angelico wrote:
> On Tue, Jul 17, 2018 at 4:55 AM, Steven D'Aprano
> <steve+comp.lang.python at pearwood.info> wrote:
>> There is nothing special about diacritics such that we ought to treat
>> some combinations like "Ch" (two code points = one character) as "fixed
>> width" while others like "â" (two code points = one character) as
>> "variable width".
> When you reverse a word, do you treat "ch" and "sh" as one character or
In English, "ch" is always two letters of the alphabet. In Welsh and
Czech, they can be one or two letters. (I think they will be two letters
only in loan words, but I'm not certain about that.) Whether that makes
them one or two characters depends on how you define "character".
Good luck with finding a universal, objective, unambiguous definition.
> I'm of the opinion that they're single characters, and thus this
> should be "dalokosh":
> (It's the Russian for "chocolate" - "шоколад" - transliterated to
> English/Latin - "šokolad" or "shokolad" - and then reversed.)
In English, I think most people would prefer to use a different term for
whatever "sh" and "ch" represent than "character". But you make a good
point that even in English, we sometimes want to treat two letter
combinations as a single unit.
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson
More information about the Python-list