Grapheme clusters, a.k.a.real characters
Marko Rauhamaa
marko at pacujo.net
Sat Jul 15 10:01:21 EDT 2017
Steve D'Aprano <steve+python at pearwood.info>:
> On Sat, 15 Jul 2017 05:50 pm, Marko Rauhamaa wrote:
>> I might want random access to the "Grapheme clusters, a.k.a.real
>> characters".
>
> That would be nice to have, but the truth is that for most coders,
> Unicode code points are the low-hanging fruit that get you 95% of the
> way, and for many applications that's "close enough".
I think "close enough" is actually dangerous. We shouldn't encourage
that practice.
> Support for the Unicode grapheme breaking algorithm would get you
> probably 90% of the rest of the way. And then some sort of
> configurable system where defaults were based on the locale would
> probably get you a fairly complete grapheme-based text library.
Yes, that kind of a text class would be useful.
> I'm interested in such a thing. That's why I pointed out the issue on
> the bug tracker, to try to garner interest in it. As far as I can
> tell, you seem to be more interested in cheap point scoring, digs
> against Unicode, and an insistence that UTF-8 is better than strings
> (which doesn't even make sense).
It does seem to me UTF-8 is a better waiting position than strings.
Strings give you more trouble while not truly solving any problems.
Marko
More information about the Python-list
mailing list