[issue8438] Codecs: "surrogateescape" error handler in Python 2.7

M.-A. Lemburg mal at egenix.com
Mon Apr 19 11:16:30 CEST 2010


Ezio Melotti wrote:
> 
> Ezio Melotti <ezio.melotti at gmail.com> added the comment:
> 
>> I consider this an important missing backport for 2.7, since
>> without this handler, the UTF-8 codecs in 2.7 and 3.x are
>> incompatible and there's no other way to work around this
>> other than to make use of the errorhandler conditionally
>> depend on the Python version.
> 
> FWIW I tried to updated the UTF-8 codec on trunk from RFC 2279 to RFC 3629 while working on #8271, and found out this difference in the handling of surrogates (only on 3.x they are invalid).
> I didn't change the behavior of the codec in the patch I attached to #8271 because it was out of the scope of the issue, but I consider the fact that in Python 2.x surrogates can be encoded as a bug, because it doesn't follow RFC 3629.
> IMHO Python 2.x should provide an RFC-3629-compliant UTF-8 codec, however I didn't have time yet to investigate how Python 3 handles this and what is the best solution (e.g. adding another codec or change the default behavior).

We have good reasons to allow lone surrogates in the UTF-8
codec.

Please remember that Python is a programming language
meant to allow writing applications, which also includes constructing
Unicode data from scratch, rather than an application which is
only meant to work with UTF-8 data.

Also note that lone surrogates were considered valid UTF-8 at the
time of adding Unicode support to Python and many years after that.

Since the codec is used in lots of applications, following the
Unicode consortium change in 2.7 is not possible.

This is why it was done in the 3.x branch and then only with
the additional surrogatepass handler to get back the old behavior
where needed.

But this is getting offtopic for the issue in question... I'll
open a new ticket for the backports.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Apr 19 2010)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::: Try our new mxODBC.Connect Python Database Interface for free ! ::::


   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/


More information about the Python-bugs-list mailing list