Re: [Python-Dev] Internal representation of strings and Micropython

5 Jun 2014


      Serhiy Storchaka writes:
...
Yes, I remember. I thing that hybrid FSR-UTF16 (like FSR, but UTF-16 is 
used instead of UCS4) is the better choice for CPython. I suppose that 
with populating emoticons and other icon characters in nearest 5 or 10 
years, even English text will often contain astral characters. And 
spending 4 bytes per character if long text contains one astral 
character looks too prodigally.
Why use something that complex if you don't have to?  For the use case
you have in mind, just map them into private space.  If you really
want to be aggressive, use surrogate space, too (anything that cares
what a scalar represents should be trapping on non-scalars, catch that
exception and look up the char -- dangerous, though, because such
exceptions are probably all over the place).

Re: [Python-Dev] Internal representation of strings and Micropython

Stephen J. Turnbull