[Python-Dev] utf-8 issue thread question

Martin v. Loewis martin@v.loewis.de
11 Sep 2002 08:23:17 +0200


Guido van Rossum <guido@python.org> writes:

> One thing to watch out for: I believe that the bit pattern that's
> encoded is not the bit pattern of the full unicode character, but
> 2**16 less.  This allows one to encode 2**16 more characters, at the
> cost of some extra complexity.

Correct. That allows to encode a total of 17 planes in Unicode, a
plane being 2**16 characters. Therefore, saying that Unicode is 20
bits is somewhat imprecise - its better to say that it is 21 bits.

Regards,
Martin