Glyphs and graphemes [was Re: Cult-like behaviour]
Terry Reedy
tjreedy at udel.edu
Mon Jul 16 15:28:51 EDT 2018
On 7/16/2018 1:11 PM, Richard Damon wrote:
> Many consider that UTF-32 is a variable-width encoding because of the combining characters. It can take multiple ‘codepoints’ to define what should be a single ‘character’ for display.
I hope you realize that this is not the standard meaning of
'variable-width encoding', which is 'variable number of bytes for a
codepoint'. UTF-16 and UTF-8 are variable width. If one expands the
definition enough, Ascii is 'variable width' because 'fi' is two bytes,
or more realistically, because <= and >= are two bytes instead of one
(as they can be in Unicode!).
If one is using a broader definition than usual, it is clearer to say so.
--
Terry Jan Reedy
More information about the Python-list
mailing list