Glyphs and graphemes [was Re: Cult-like behaviour]

Chris Angelico rosuav at gmail.com
Mon Jul 16 14:51:55 EDT 2018


On Tue, Jul 17, 2018 at 4:22 AM, Richard Damon <Richard at damon-family.org> wrote:
>
> But I am not talking about those sort of characters or ligatures, but ‘characters’ that are built up of a combining diacritical marks (like accents) and a base character. Unicode define many code points for the more common of these, but many others do not.
>

So, you're talking about "grapheme clusters". Those can be arbitrarily
large and complex. Trolls revel in the ability to adorn base
characters with ridiculous numbers of "dripping" marks, making it hard
to type their names. Since the amount of information in one grapheme
cluster is (as far as I know) potentially infinite, it's fundamentally
impossible to create a fixed-size encoding that can represent them. If
I'm wrong about the possibilities being infinite, then they are
certainly very extensive, as there are MANY combining characters
available (the only question is whether you can use the same
characters multiple times, in which case there are infinite
possibilities, or if not, in which case the possibilities are
base_characters*2^combining_characters aka "virtually infinite").

http://www.unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries

This is a display feature, not an input feature, and certainly not a
string representation feature.

ChrisA


More information about the Python-list mailing list