Grapheme clusters, a.k.a.real characters
Rhodri James
rhodri at kynesim.co.uk
Fri Jul 14 10:52:10 EDT 2017
On 14/07/17 15:32, Michael Torrie wrote:
> On 07/14/2017 08:05 AM, Rhodri James wrote:
>> On 14/07/17 14:31, Marko Rauhamaa wrote:
>>> Of course, UTF-8 in a bytes object doesn't make the situation any
>>> better, but does it make it any worse?
>>
>> Speaking as someone who has been up to his elbows in this recently, I
>> would say emphatically that it does make things worse. It adds an extra
>> layer of complexity to all of the questions you were asking, and more.
>> A single codepoint is a meaningful thing, even if its meaning may be
>> modified by combining. A single byte may or may not be meaningful.
>
> Are you saying that dealing with Unicode in Google Go, which uses UTF-8
> in memory, is adding an extra layer of complexity and makes things worse
> than they might be in Python?
I'm not familiar with Go. If the programmer has to be aware that the
she is using UTF-8 under the hood, then yes, it does add an extra layer
of complexity. You have to remember the rules of UTF-8 as well as
everything else.
--
Rhodri James *-* Kynesim Ltd
More information about the Python-list
mailing list