On Mon, Feb 13, 2012 at 3:04 PM, Stephen J. Turnbull <stephen@xemacs.org> wrote:
The ASCII speakers are a pretty clear-cut case. Using 'latin-1' as the codec, almost all things they can do with a 100% ASCII program and a sanely-encoded text (which leaves out Shift JIS, Big 5, and maybe some obsolete Vietnamese encodings, but not much else AFAIK) will pass through the non-ASCII verbatim, or delete it.
I'd hazard a guess that the non-ASCII compatible encoding mostly likely to be encountered outside Asia is UTF-16. The choice is really between "never give me UnicodeErrors, but feel free to silently corrupt the data stream if I do the wrong thing with that data" (i.e. "latin-1") and "correctly handle any ASCII compatible encoding, but still throw UnicodeEncodeError if I'm about to emit corrupted data" ("ascii+surrogateescape"). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia