Glyphs and graphemes [was Re: Cult-like behaviour]

Richard Damon Richard at
Mon Jul 16 13:11:23 EDT 2018

> On Jul 16, 2018, at 12:51 PM, Steven D'Aprano <steve+comp.lang.python at> wrote:
>> On Mon, 16 Jul 2018 00:28:39 +0300, Marko Rauhamaa wrote:
>> if your new system used Python3's UTF-32 strings as a foundation, that
>> would be an equally naïve misstep. You'd need to reach a notch higher
>> and use glyphs or other "semiotic atoms" as building blocks. UTF-32,
>> after all, is a variable-width encoding.
> Python's strings aren't UTF-32. They are sequences of abstract code 
> points.
> UTF-32 is not a variable-width encoding.
> -- 
> Steven D'Aprano

Many consider that UTF-32 is a variable-width encoding because of the combining characters. It can take multiple ‘codepoints’ to define what should be a single ‘character’ for display.

More information about the Python-list mailing list