Glyphs and graphemes [was Re: Cult-like behaviour]
steve+comp.lang.python at pearwood.info
Mon Jul 16 21:27:18 EDT 2018
On Mon, 16 Jul 2018 15:28:51 -0400, Terry Reedy wrote:
> On 7/16/2018 1:11 PM, Richard Damon wrote:
>> Many consider that UTF-32 is a variable-width encoding because of the
>> combining characters. It can take multiple ‘codepoints’ to define what
>> should be a single ‘character’ for display.
> I hope you realize that this is not the standard meaning of
> 'variable-width encoding', which is 'variable number of bytes for a
A minor correction Terry: it is the number of code units, not bytes.
UTF-8 uses 1-byte code units, and from 1 to 4 code units per code point;
UTF-16 uses 2-byte code units (a 16-bit word), and 1 or 2 words per code
UTF-32 uses 4-byte code units (a 32-bit word), and only ever a single
code unit for every code point.
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson
More information about the Python-list