Ok, I thought there was also a form normalized (denormalized?) to decomposed form. But I'll take your word.
If I understood the example correctly, he needs a mixed form, with some characters decomposed and some composed (depending on which one looks better in the given font). I agree that this sound more like a font problem, but it's a wide spread font problem and it may be necessary to address it in an application. But this is only one example of why an application-specific concept of graphemes different from the Unicode-defined normalized forms can be useful. I think the very concept of a grapheme is context, language, and culture specific. For example, in Chinese Pinyin it would be very natural to write tone marks with composing diacritics (i.e. in decomposed form). But then you have the vowel "ü" and it would be strange to decompose it into an "u" and combining diaeresis. So conceptually the most sensible representation of "lǜ" would be neither the composed not the decomposed normal form, and depending on its needs an application might want to represent it in the mixed form (composing the diaeresis with the "u", but leaving the grave accent separate). There must be many more examples where the conceptual context determines the right composition, like for "ñ", which is Spanish is certainly a grapheme, but in mathematics might be better represented as n-tilde. The bottom line is that, while an array of Unicode code points is certainly a generally useful data type (and PEP 393 is a great improvement in this regard), an array of graphemes carries many subtleties and may not be nearly as universal. Support in the spirit of unicodedata's normalization function etc. is certainly a good thing, but we shouldn't assume that everyone will want Python to do their graphemes for them. - Hagen