[I18n-sig] How does Python Unicode treat surrogates?

Tom Emerson tree@basistech.com
Mon, 25 Jun 2001 09:55:07 -0400

Guido van Rossum writes:
> I'm all for taking the lazy approach and letting applications that
> need surrogate support do it themselves, at the application level.

Meaning what? Leaving it up to the application to be entirely
responsible for handling surrogates is a mistake. As was stated
earlier in the thread (apologies, I don't have the message around to
make the appropriate attribution) surrogates are an implementation
detail: to the user/application developer the presence of the
surrogate pair needs to be transparent.

As long as the Unicode support functionality groks surrogates
correctly (fully implements UTF-16) then the issue becomes a small one
for the end user. The scanner would need to be modified to support
Unicode escapes for values up to 0x10FFFF. Internally these are
represented as surrogates.

Put the burden of these multibyte representations on the library
implementor, not the end-user.


Tom Emerson                                          Basis Technology Corp.
Sr. Sinostringologist                              http://www.basistech.com
  "Beware the lollipop of mediocrity: lick it once and you suck forever"