On Wed, Feb 15, 2012 at 12:43 PM, Stephen J. Turnbull <stephen@xemacs.org> wrote:
It's arguable that most applications *should* want errors in these cases; I've made that argument myself. But it's quite clearly not the user's intent.
However, from a correctness point of view, it's a big step up from just saying "latin-1" (which effectively turns off *all* of the additional encoding related sanity checking Python 3 offers over Python 2). For many "I don't care about Unicode" use cases, using "ascii+surrogateescape" for your own I/O and setting "backslashreplace" on sys.stdout should cover you (and any exceptions you get will be warning you about cases where your original assumptions about not caring about Unicode validity have been proven wrong). If the logging module doesn't do it already, it should probably be defaulting to backslashreplace when encoding messages, too (for the same reason sys.stderr already defaults to that - you don't want your error reporting system failing to encode corrupted Unicode data). sys.stdin and sys.stdout are different due to the role they play in pipeline processing - for those, locale.getpreferredencoding()+"strict" is a more reasonable default (but we should make it easy to replace them with something more specific for a given application, hence http://bugs.python.org/issue14017) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia