Serhiy Storchaka writes:
Yes, I remember. I thing that hybrid FSR-UTF16 (like FSR, but UTF-16 is used instead of UCS4) is the better choice for CPython. I suppose that with populating emoticons and other icon characters in nearest 5 or 10 years, even English text will often contain astral characters. And spending 4 bytes per character if long text contains one astral character looks too prodigally.
Why use something that complex if you don't have to? For the use case you have in mind, just map them into private space. If you really want to be aggressive, use surrogate space, too (anything that cares what a scalar represents should be trapping on non-scalars, catch that exception and look up the char -- dangerous, though, because such exceptions are probably all over the place).