On 29.08.2014 02:41, Stephen J. Turnbull wrote:
In the process of booking up for my other post in this thread, I noticed the 'surrogatepass' handler.
Is there a real use case for the 'surrogatepass' error handler? It seems like a horrible break in the abstraction. IMHO, if there's a need, the application should handle this. Python shouldn't provide it on encoding as the resulting streams are not Unicode conformant, nor on decoding UTF-16, as conversion of surrogate pairs is a requirement of all Unicode versions since about 1995.
This error handler allows applications to reactivate the Python 2 style behavior of the UTF codecs in Python 3, which allow reading lone surrogates on input.
Since Python allows working with lone surrogates in Unicode (they are valid code points) and we're using UTF-8 for marshal, we needed a way to make sure that Python 3 also optionally supports working with lone surrogates in such UTF-8 streams (nowadays called CESU-8: http://en.wikipedia.org/wiki/CESU-8).