[Python-Dev] utf-8 issue thread question
Martin v. Loewis
martin@v.loewis.de
11 Sep 2002 08:23:17 +0200
Guido van Rossum <guido@python.org> writes:
> One thing to watch out for: I believe that the bit pattern that's
> encoded is not the bit pattern of the full unicode character, but
> 2**16 less. This allows one to encode 2**16 more characters, at the
> cost of some extra complexity.
Correct. That allows to encode a total of 17 planes in Unicode, a
plane being 2**16 characters. Therefore, saying that Unicode is 20
bits is somewhat imprecise - its better to say that it is 21 bits.
Regards,
Martin