New subject: Unicode surrogateescape [was: Re: Python 3000 TIOBE -3%]

Feb. 14, 2012

      On Mon, Feb 13, 2012 at 12:12 AM, Stephen J. Turnbull
<stephen@xemacs.org> wrote:
...
Paul Moore writes:
...
 > I'm now 100% convinced that
 > encoding="ascii",errors="surrogateescape" is the way to say this in
 > code.
...
That may also be a good universal default for Python 3, as it will
pass through non-ASCII text unchanged, while raising an error if the
program tries to manipulate it (or hand it to a module that
validates).  (encoding='latin-1' definitely is not a good default.)
But I'm not sure of that, and the current approach of using the
preferred system encoding is probably better.
The preferred system encoding is indeed better than universal ASCII.

But is there a good reason not to change the default errorhandler to
errors="surrogateescape"?

errors="strict" is already well-documented, and the sort of people
most eager to reject (rather than ignore) bad data also tend to be
explicit about their use of defaults.

And if the barrier is only backwards-compatibility, is there any
reason not to at least recommend a recipe of errors="surrogateescape"
for cases where you expect ASCII, but want to round-trip other data
just in case?

-jJ

Unicode surrogateescape [was: Re: Python 3000 TIOBE -3%]

Jim Jewett

Carl M. Johnson

Cameron Simpson

Carl M. Johnson

Cameron Simpson

tags

participants (3)