Cult-like behaviour [was Re: Kindness]

Rhodri James rhodri at
Mon Jul 16 15:56:11 EDT 2018

On 16/07/18 20:40, Marko Rauhamaa wrote:
> Terry Reedy<tjreedy at>:
>> On 7/15/2018 5:28 PM, Marko Rauhamaa wrote:
>>> if your new system used Python3's UTF-32 strings as a foundation,
>> Since 3.3, Python's strings are not (always) UFT-32 strings.
> You are right. Python's strings are a superset of UTF-32. More
> accurately, Python's strings are UTF-32 plus surrogate characters.
>> Nor are they always UCS-2 (or partly UTF-16) strings. Nor are the
>> always Latin-1 or Ascii strings. Python's Flexible String
>> Representation uses the narrowest possible internal code for any
>> particular string. This is all transparent to the user except for
>> memory size.
> How CPython chooses to represent its strings internally is not what I'm
> talking about.
>>> UTF-32, after all, is a variable-width encoding.
>> Nope.  It a fixed-width (32 bits, 4 bytes) encoding.
>> Perhaps you should ask more questions before pontificating.
> You mean each code point is one code point wide. But that's rather an
> irrelevant thing to state. The main point is that UTF-32 (aka Unicode)
> uses one or more code points to represent what people would consider an
> individual character.

UTF-32 != Unicode, but that's a separate esoteric argument.

The problem everyone is having with you, Marko, is that you are using 
the terminology incorrectly.  When you say that more than one codepoint 
can be used to represent what people would consider an individual 
character, you are correct (and would be more correct if you called 
"what people would consider an individual character" a "glyph").  When 
you call UTF-32 a variable-width encoding, you are incorrect.

You are of course welcome to use whatever terminology you personally 
like, like Humpty Dumpty.  However when you point to a duck and say 
"That's a gnu," people are likely to stop taking you seriously.

Rhodri James *-* Kynesim Ltd

More information about the Python-list mailing list