[Python-Dev] surrogatepass - she's a witch, burn 'er! [was: Cleaning up ...]

M.-A. Lemburg mal at egenix.com
Fri Aug 29 09:48:50 CEST 2014

On 29.08.2014 02:41, Stephen J. Turnbull wrote:
> In the process of booking up for my other post in this thread, I
> noticed the 'surrogatepass' handler.
> Is there a real use case for the 'surrogatepass' error handler?  It
> seems like a horrible break in the abstraction.  IMHO, if there's a
> need, the application should handle this.  Python shouldn't provide
> it on encoding as the resulting streams are not Unicode conformant,
> nor on decoding UTF-16, as conversion of surrogate pairs is a
> requirement of all Unicode versions since about 1995.

This error handler allows applications to reactivate the Python 2
style behavior of the UTF codecs in Python 3, which allow reading
lone surrogates on input.

Since Python allows working with lone surrogates in Unicode (they
are valid code points) and we're using UTF-8 for marshal, we needed
a way to make sure that Python 3 also optionally supports working
with lone surrogates in such UTF-8 streams (nowadays called CESU-8:



for discussions.

Marc-Andre Lemburg

Professional Python Services directly from the Source  (#1, Aug 29 2014)
>>> Python Projects, Consulting and Support ...   http://www.egenix.com/
>>> mxODBC.Zope/Plone.Database.Adapter ...       http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
2014-08-27: Released eGenix PyRun 2.0.1 ...       http://egenix.com/go62
2014-09-19: PyCon UK 2014, Coventry, UK ...                21 days to go
2014-09-27: PyDDF Sprint 2014 ...                          29 days to go

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611

More information about the Python-Dev mailing list