[Python-ideas] Unicode surrogateescape [was: Re: Python 3000 TIOBE -3%]

Tue Feb 14 21:04:05 CET 2012

On Mon, Feb 13, 2012 at 12:12 AM, Stephen J. Turnbull
<stephen at xemacs.org> wrote:
> Paul Moore writes:

>  > I'm now 100% convinced that
>  > encoding="ascii",errors="surrogateescape" is the way to say this in
>  > code.

> That may also be a good universal default for Python 3, as it will
> pass through non-ASCII text unchanged, while raising an error if the
> program tries to manipulate it (or hand it to a module that
> validates).  (encoding='latin-1' definitely is not a good default.)
> But I'm not sure of that, and the current approach of using the
> preferred system encoding is probably better.

The preferred system encoding is indeed better than universal ASCII.

But is there a good reason not to change the default errorhandler to
errors="surrogateescape"?

errors="strict" is already well-documented, and the sort of people
most eager to reject (rather than ignore) bad data also tend to be
explicit about their use of defaults.

And if the barrier is only backwards-compatibility, is there any
reason not to at least recommend a recipe of errors="surrogateescape"
for cases where you expect ASCII, but want to round-trip other data
just in case?

-jJ