27 Oct
2019
27 Oct
'19
9:43 a.m.
On Sun, Oct 27, 2019, at 03:39, Andrew Barnert via Python-ideas wrote:
(Actually, IIRC, one of the two has a str type that, despite being 2.x, is unicode rather than bytes, but with some extra undocumented functionality to smuggle bytes around in a str and have it sometimes work.)
I do like the way GNU Emacs represents strings - abstractly, a string can contain any character, or any byte > 127 distinct from a character. Concretely, IIRC they are represented either as pure byte strings or as UTF-8 with "bytes > 127" represented as the extended UTF-8 representations of code points 0x3FFF80 through 0x3FFFFF [values between 0x110000 and 0x3FFF7F are used for other purposes].