Grapheme clusters, a.k.a.real characters
Rick Johnson
rantingrickjohnson at gmail.com
Sun Jul 16 00:52:41 EDT 2017
On Saturday, July 15, 2017 at 9:33:49 PM UTC-5, Ben Finney wrote:
> MRAB <python at mrabarnett.plus.com> writes:
[...]
> > Is linefeed a character? You might call it a "control
> > character", but it's not really a _character_, it's
> > control/format _code_.
>
> And yet the ASCII and Unicode standard says code point 0x0A
> (U+000A LINE FEED) is a character, by definition. Rather
> than saying “no, it's not a character”, I think a more
> accurate statement would be: a linefeed *is* a character in
> ASCII, but that doesn't mean every other standard must
> agree. Indeed it may be better to say: a line feed is a
> character and is also a control code.
>
> > Is an acute accent a character?
>
> Yes, according to Unicode. ‘´’ (U+0301 ACUTE ACCENT) is a
> character.
>
> > No, it's a diacritic mark that's added to a character.
>
> Lose the “no”, and I agree.
So you would be happy with a string containing a single
character that was _decorated_ with a single accent mark
(say, for instance U+00E3 (Latin Small Letter A with
tilde), to return a length value of 2? Really?
> It's entirely reasonable for a concept to fit in multiple
> categories simultaneously.
Reasonable? Perhaps...
Practical? No way!
More information about the Python-list
mailing list