<p dir="ltr"><br>

On Jun 3, 2014 11:27 PM, "Steven D'Aprano" <<a href="mailto:steve@pearwood.info">steve@pearwood.info</a>> wrote:<br>

> For technical reasons which I don't fully understand, Unicode only<br>

> uses 21 of those 32 bits, giving a total of 1114112 available code<br>

> points.</p>

<p dir="ltr">I think mainly it's to accommodate UTF-16. The surrogate pair scheme is sufficient to encode up to 16 supplementary planes, so if Unicode were allowed to grow any larger than that, UTF-16 would no longer be able to encode all codepoints.</p>


<p dir="ltr">Another benefit of fixing the size is that it frees the other 11 bits per character of UTF-32 for packing in ancillary data.</p>