Glyphs and graphemes [was Re: Cult-like behaviour]

Chris Angelico rosuav at
Mon Jul 16 17:11:53 EDT 2018

On Tue, Jul 17, 2018 at 7:02 AM, Ethan Furman <ethan at> wrote:
> On 07/16/2018 01:15 PM, Chris Angelico wrote:
>> On Tue, Jul 17, 2018 at 4:55 AM, Steven D'Aprano wrote:
>>> There is nothing special about diacritics such that we ought to treat
>>> some combinations like "Ch" (two code points = one character) as "fixed
>>> width" while others like "â" (two code points = one character) as
>>> "variable width".
>> When you reverse a word, do you treat "ch" and "sh" as one character
>> or two? I'm of the opinion that they're single characters, and thus
>> this should be "dalokosh":
> Depends on the language:  in Spanish, "ch" is it's own letter (at least it
> was when I grew up), so any word containing it should still contain it when
> reversed:  "chica" would be "acich".

Yeah. In Russian, "sh" is the single character "ш". I'm of the opinion
that, even after being transliterated into English phonetics, that
should be treated as a unit. ISO-9 uses "š" rather than "sh", which is
an improvement in character correspondence, but your average English
speaker is more likely to be able to pronounce "dalokosh" correctly
than to figure out "dalokoš". In the same way, I created a magic item
in a D&D campaign called "Yasham Burda", even though the more correct
spelling would be "Yaşam Burda" or even "Yasam Burda", for the benefit
of my monolingual players. But I'd still treat the "sh" as one

Ain't transliteration fun?


More information about the Python-list mailing list