Glyphs and graphemes [was Re: Cult-like behaviour]

Richard Damon Richard at Damon-family.org
Mon Jul 16 13:11:23 EDT 2018


> On Jul 16, 2018, at 12:51 PM, Steven D'Aprano <steve+comp.lang.python at pearwood.info> wrote:
> 
>> On Mon, 16 Jul 2018 00:28:39 +0300, Marko Rauhamaa wrote:
>> 
>> if your new system used Python3's UTF-32 strings as a foundation, that
>> would be an equally naïve misstep. You'd need to reach a notch higher
>> and use glyphs or other "semiotic atoms" as building blocks. UTF-32,
>> after all, is a variable-width encoding.
> 
> Python's strings aren't UTF-32. They are sequences of abstract code 
> points.
> 
> UTF-32 is not a variable-width encoding.
> 
> -- 
> Steven D'Aprano
> 

Many consider that UTF-32 is a variable-width encoding because of the combining characters. It can take multiple ‘codepoints’ to define what should be a single ‘character’ for display.


More information about the Python-list mailing list