Grapheme clusters, a.k.a.real characters
Marko Rauhamaa
marko at pacujo.net
Tue Jul 18 13:01:10 EDT 2017
Chris Angelico <rosuav at gmail.com>:
> what you're more likely to want is "match the letter á", and you don't
> care whether it's represented as U+0061 U+0301 or as U+00E1. That's
> where Unicode normalization comes in.
Yes. Also, not every letter can be normalized to a single codepoint so
NFC is not a way out. For example,
re.match("^[q̈]$", "q̈")
returns None regardless of normalization.
Marko
More information about the Python-list
mailing list