Cult-like behaviour [was Re: Kindness]
rhodri at kynesim.co.uk
Mon Jul 16 15:56:11 EDT 2018
On 16/07/18 20:40, Marko Rauhamaa wrote:
> Terry Reedy<tjreedy at udel.edu>:
>> On 7/15/2018 5:28 PM, Marko Rauhamaa wrote:
>>> if your new system used Python3's UTF-32 strings as a foundation,
>> Since 3.3, Python's strings are not (always) UFT-32 strings.
> You are right. Python's strings are a superset of UTF-32. More
> accurately, Python's strings are UTF-32 plus surrogate characters.
>> Nor are they always UCS-2 (or partly UTF-16) strings. Nor are the
>> always Latin-1 or Ascii strings. Python's Flexible String
>> Representation uses the narrowest possible internal code for any
>> particular string. This is all transparent to the user except for
>> memory size.
> How CPython chooses to represent its strings internally is not what I'm
> talking about.
>>> UTF-32, after all, is a variable-width encoding.
>> Nope. It a fixed-width (32 bits, 4 bytes) encoding.
>> Perhaps you should ask more questions before pontificating.
> You mean each code point is one code point wide. But that's rather an
> irrelevant thing to state. The main point is that UTF-32 (aka Unicode)
> uses one or more code points to represent what people would consider an
> individual character.
UTF-32 != Unicode, but that's a separate esoteric argument.
The problem everyone is having with you, Marko, is that you are using
the terminology incorrectly. When you say that more than one codepoint
can be used to represent what people would consider an individual
character, you are correct (and would be more correct if you called
"what people would consider an individual character" a "glyph"). When
you call UTF-32 a variable-width encoding, you are incorrect.
You are of course welcome to use whatever terminology you personally
like, like Humpty Dumpty. However when you point to a duck and say
"That's a gnu," people are likely to stop taking you seriously.
Rhodri James *-* Kynesim Ltd
More information about the Python-list