[Python-ideas] Python 3000 TIOBE -3%

Nick Coghlan ncoghlan at gmail.com
Wed Feb 15 04:22:02 CET 2012

On Wed, Feb 15, 2012 at 12:43 PM, Stephen J. Turnbull
<stephen at xemacs.org> wrote:
> It's arguable that most applications *should* want errors in these
> cases; I've made that argument myself.  But it's quite clearly not the
> user's intent.

However, from a correctness point of view, it's a big step up from
just saying "latin-1" (which effectively turns off *all* of the
additional encoding related sanity checking Python 3 offers over
Python 2). For many "I don't care about Unicode" use cases, using
"ascii+surrogateescape" for your own I/O and setting
"backslashreplace" on sys.stdout should cover you (and any exceptions
you get will be warning you about cases where your original
assumptions about not caring about Unicode validity have been proven

If the logging module doesn't do it already, it should probably be
defaulting to backslashreplace when encoding messages, too (for the
same reason sys.stderr already defaults to that - you don't want your
error reporting system failing to encode corrupted Unicode data).

sys.stdin and sys.stdout are different due to the role they play in
pipeline processing - for those,
locale.getpreferredencoding()+"strict" is a more reasonable default
(but we should make it easy to replace them with something more
specific for a given application, hence


Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

More information about the Python-ideas mailing list