[Python-Dev] Re: String module
Guido van Rossum
guido@python.org
Wed, 29 May 2002 21:29:25 -0400
> This reminds me that I often miss, in the standard `ctype.h' and related,
> a function that would un-combine a character into its base character and
> its diacritic, and the complementary re-combining function.
>
> Even if this might be easier for Latin-1, it is difficult to design
> something general enough. Characters may have a more complex structure
> than a mere base and single diacritic. I do not know what to suggest.
I bet the Unicode standard has a standard way to do this. Maybe we
can implement that, and then project the same interface on 8-bit
characters? Of course character encoding issues might get in the way
if <ctype.h> doesn't provide the data -- so you may be better off
doing this in Unicode only. (We must never assume that 8-bit strings
contain Latin-1.)
--Guido van Rossum (home page: http://www.python.org/~guido/)