Unicode utf-8 doesn't do back-and-forth?

John Machin sjmachin at lexicon.net
Mon Jul 8 19:00:20 EDT 2002


Tim Peters <tim.one at comcast.net> wrote in message news:<mailman.1025664245.32466.python-list at python.org>...

> The rest is history, and "surrogates" are a hack to get the effect of 4 more
> bits (way more than enough to last us forever 10 times over).  In
> pre-Unicode-speak, you'd call them "escape codes".

4 more bits? It needs 21 bits to encode the 2**20 possible
surrogate-described characters plus the basic 64K characters.
assert 21 - 16 == 5



More information about the Python-list mailing list