Grapheme clusters, a.k.a.real characters
Gregory Ewing
greg.ewing at canterbury.ac.nz
Wed Jul 19 01:51:49 EDT 2017
Chris Angelico wrote:
> Once you NFC or NFD normalize both strings, identical strings will
> generally have identical codepoints... You should then be able to use normal regular expressions to
> match correctly.
Except that if you want to match a set of characters,
you can't reliably use [...], you would have to write
them out as alternatives in case some of them take
up more than one code point.
--
Greg
More information about the Python-list
mailing list