[Python-ideas] Add "has_surrogates" flags to string object

random832 at fastmail.us random832 at fastmail.us
Tue Oct 8 17:27:52 CEST 2013


On Tue, Oct 8, 2013, at 7:58, Masklinn wrote:
> I don't know the details of the flexible string representation, but I
> believed the names fit what was actually in memory. UCS2 does not
> have surrogate pairs, thus surrogate codes make no sense in UCS2,
> they're a UTF-16 concept. Likewise for UCS4. Surrogate codes are not
> codepoints, they have no reason to appear in either UCS2 or UCS4
> outside of encoding errors.

They can also occur due to slicing a ctypes unicode buffer, due to PEP
383, or due to native UTF-16 filenames that contain invalid surrogates.
The latter two also create situations where you need to generate them.


More information about the Python-ideas mailing list