[Python-Dev] Re: String module

Guido van Rossum guido@python.org
Wed, 29 May 2002 21:29:25 -0400


> This reminds me that I often miss, in the standard `ctype.h' and related,
> a function that would un-combine a character into its base character and
> its diacritic, and the complementary re-combining function.
> 
> Even if this might be easier for Latin-1, it is difficult to design
> something general enough.  Characters may have a more complex structure
> than a mere base and single diacritic.  I do not know what to suggest.

I bet the Unicode standard has a standard way to do this.  Maybe we
can implement that, and then project the same interface on 8-bit
characters?  Of course character encoding issues might get in the way
if <ctype.h> doesn't provide the data -- so you may be better off
doing this in Unicode only.  (We must never assume that 8-bit strings
contain Latin-1.)

--Guido van Rossum (home page: http://www.python.org/~guido/)