On Sat, Oct 26, 2019, at 20:26, David Mertz wrote:
Absolutely, utf-8 is a wonderful encoding. And indeed, worst case is the same storage requirement as utf-16 or utf-32. For O(1) random access into all strings, we have to eat 32-bits per character, one way or the other, but of course there are space/speed trade-offs one could make for intermediate behavior.
A string representation considering of (say) a UTF-8 string, plus an auxiliary list of byte indices of, say, 256-codepoint-long chunks [along with perhaps a flag to say that the chunk is all-ASCII or not] would provide O(1) random access, though, of course, despite both being O(1), "single index access" vs "single index access then either another index access or up to 256 iterate-forward operations" aren't *really* the same speed.