PEP 393

The Unicode string type is changed to support multiple internal representations, depending on the character with the largest Unicode ordinal (1, 2, or 4 bytes)

... Ah, OK. I get it. One byte representation is only ASCII, which happens to match utf-8. Well, the latin-1 oddness. But the internal representation is utf-16 or utf-32 if the string contains code points requiring multi-byte representation.

On Sun, Oct 27, 2019, 12:19 AM Chris Angelico <rosuav@gmail.com> wrote:
On Sun, Oct 27, 2019 at 2:37 PM David Mertz <mertz@gnosis.cx> wrote:
> What does actual CPython do currently to find that s[1_000_000], assuming utf-8 internal representation?
>

Mu.

CPython does not have a UTF-8 internal representation.

ChrisA
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-leave@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/JZF35M3NBU42EH5Y37AAN4BCXQCZ63B2/
Code of Conduct: http://python.org/psf/codeofconduct/