Micro Python -- a lean and efficient implementation of Python 3
Terry Reedy
tjreedy at udel.edu
Wed Jun 4 03:00:05 EDT 2014
On 6/4/2014 1:55 AM, Ian Kelly wrote:
>
> On Jun 3, 2014 11:27 PM, "Steven D'Aprano" <steve at pearwood.info
> <mailto:steve at pearwood.info>> wrote:
> > For technical reasons which I don't fully understand, Unicode only
> > uses 21 of those 32 bits, giving a total of 1114112 available code
> > points.
>
> I think mainly it's to accommodate UTF-16. The surrogate pair scheme is
> sufficient to encode up to 16 supplementary planes, so if Unicode were
> allowed to grow any larger than that, UTF-16 would no longer be able to
> encode all codepoints.
I believe the original utf-8 used up to 6 bytes per char to encode 2**32
potential chars. Just 4 bytes limits to 2**21 and for whatever reason
(easier decoding?), utf-8 was revised down (unusual ;-).
--
Terry Jan Reedy
More information about the Python-list
mailing list