[I18n-sig] How does Python Unicode treat surrogates?

Tom Emerson tree@basistech.com
Mon, 25 Jun 2001 21:01:58 -0400

Martin v. Loewis writes:
> To my knowledge, it only differs in minor points, which is only caused
> by different release dates (at one time, Unicode is behind, at another
> time, the ISO standard).

The Unicode Technical Committee and WG2 are striving to make the two
standards move in lock step as much as possible. Unfortunately the
process of adding to an ISO standard is much more involved and time
consuming than that required for Unicode.

> End users typically view it as Unicode, whereas standards bodies and
> agencies typically view it as ISO 10646 (e.g. C, C++, and Posix all
> refer to ISO 10646, Microsoft refers to Unicode).

The standards are code-point for code-point compatible. The primary
difference is that Unicode provides property information that 10646
does not, and the UTC strives to standardize mapping tables for new
encodings (e.g., GB 18030 and JIS X 0213-2000).

Tom Emerson                                          Basis Technology Corp.
Sr. Sinostringologist                              http://www.basistech.com
  "Beware the lollipop of mediocrity: lick it once and you suck forever"