Glyphs and graphemes [was Re: Cult-like behaviour]
Steven D'Aprano
steve+comp.lang.python at pearwood.info
Mon Jul 16 21:26:55 EDT 2018
On Mon, 16 Jul 2018 22:51:32 +0300, Marko Rauhamaa wrote:
> All UTF-8. No unicode strings.
That just means you are re-implementing the bits of Unicode you care
about (which may be "nothing at all") as UTF-8. If your application is
nothing but middleware squirting bytes from one layer to another layer,
that might be all you need care about.
But then you're not processing text in your application, and why should
your experience in not-processing-text be given any weight over the
experiences of those who do process text?
And later, in another post:
> UTF-8 bytes can only represent the first 128 code points of Unicode.
This is DailyWTF material. Perhaps you want to rethink your wording and
maybe even learn a bit more about Unicode and the UTF encodings before
making such statements.
The idea that UTF-8 bytes cannot represent the whole of Unicode is not
even wrong. Of course a *single* byte cannot, but a single byte is not
"UTF-8 bytes".
--
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson
More information about the Python-list
mailing list