Unicode utf-8 doesn't do back-and-forth?

Tim Peters tim.one at comcast.net
Tue Jul 2 22:43:00 EDT 2002


[Mike Fletcher]
> ...
> No clue what a surrogate is for, but I suppose there's no point
> including them in a character-classes set if they're designed
> specifically as unicode internal chars.

In a galaxy far, far away, a bunch of geeks were sitting around bored, when
one had a bright idea.  "I know!" she exclaimed.  "Let's invent a character
encoding that covers all the world's character sets in one gulp!"

"Hmm!" mused the oldest geek, who had been around long enough not to be
blinded by American assumptions.  "That's a lot of characters -- we might
need 10, or even 11, bits!"  "So let's use 16!" countered the rest.  "That's
way more than enough to last us forever 10 times over!"

The rest is history, and "surrogates" are a hack to get the effect of 4 more
bits (way more than enough to last us forever 10 times over).  In
pre-Unicode-speak, you'd call them "escape codes".

not-to-be-confused-with-escape-code-points-ly y'rs  - tim






More information about the Python-list mailing list