[I18n-sig] How does Python Unicode treat surrogates?
Martin v. Loewis
martin@loewis.home.cs.tu-berlin.de
Tue, 26 Jun 2001 01:32:25 +0200
> Does that make sense?
>
> I know I am hindered by a lack of understanding of Unicode
> hairsplitting, angels-on-a-pin-dancing details; if I'm missing
> something, it's likely that many other people don't know the details
> either, so an explanation would be much appreciated!
I don't think you are missing any detail; I guess you are fully aware
that you are throwing one of Unicode's biggest strengths out of the
window :-) namely the possibility to index index characters, not the
internal representation.
As for Unicode hairsplitting: I think combining characters *are*
different in that respect; they are code points on their own, even
though they might have a zero-width representation. Also,
normalization forms can help with combining characters; they don't
help with surrogates.
Regards,
Martin