Glyphs and graphemes [was Re: Cult-like behaviour]
Steven D'Aprano
steve+comp.lang.python at pearwood.info
Tue Jul 17 04:04:29 EDT 2018
On Tue, 17 Jul 2018 15:20:16 +0900, INADA Naoki wrote (replying to Marko):
> I still don't understand what's your original point. I think UTF-8 vs
> UTF-32 is totally different from Python 2 vs 3.
>
> For example, string in Rust and Swift (2010s languages!) are *valid*
> UTF-8. There are strong separation between byte array and string, even
> they use UTF-8. They looks similar to Python 3, not Python 2.
>
> And Python can use UTF-8 for internal encoding in the future. AFAIK,
> PyPy tries it now. After they succeeded, I want to try port it to
> CPython after we removed legacy Unicode APIs. (ref PEP 393)
I'm not sure about PyPy, but I'm fairly certain that MicroPython uses
UTF-8.
I would be very interested to see the results of using UTF-8 in CPython.
At the least, it would remove the need to keep a separate UTF-8
representation in the string object, as they do now. It might even be
more compact, although a naive implementation would lose the ability to
do constant time indexing into strings.
That might be a tradeoff worth keeping, if indexing remained sufficiently
fast.
--
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson
More information about the Python-list
mailing list