[Python-ideas] Add "has_surrogates" flags to string object
random832 at fastmail.us
random832 at fastmail.us
Tue Oct 8 17:27:52 CEST 2013
On Tue, Oct 8, 2013, at 7:58, Masklinn wrote:
> I don't know the details of the flexible string representation, but I
> believed the names fit what was actually in memory. UCS2 does not
> have surrogate pairs, thus surrogate codes make no sense in UCS2,
> they're a UTF-16 concept. Likewise for UCS4. Surrogate codes are not
> codepoints, they have no reason to appear in either UCS2 or UCS4
> outside of encoding errors.
They can also occur due to slicing a ctypes unicode buffer, due to PEP
383, or due to native UTF-16 filenames that contain invalid surrogates.
The latter two also create situations where you need to generate them.
More information about the Python-ideas
mailing list