Grapheme clusters, a.k.a.real characters
Marko Rauhamaa
marko at pacujo.net
Fri Jul 14 10:14:39 EDT 2017
Rhodri James <rhodri at kynesim.co.uk>:
> On 14/07/17 14:31, Marko Rauhamaa wrote:
>> Of course, UTF-8 in a bytes object doesn't make the situation any
>> better, but does it make it any worse?
>
> Speaking as someone who has been up to his elbows in this recently, I
> would say emphatically that it does make things worse. It adds an
> extra layer of complexity to all of the questions you were asking, and
> more. A single codepoint is a meaningful thing, even if its meaning
> may be modified by combining. A single byte may or may not be
> meaningful.
I'd like to understand this better. Maybe you have a couple of examples
to share?
Marko
More information about the Python-list
mailing list