Re: [Python-ideas] Adding 'bytes' as alias for 'latin_1' codec.

May 27, 2011

      On Fri, May 27, 2011 at 4:14 PM, INADA Naoki <songofacandy@gmail.com> wrote:
...
I love unicode and use unicode when I can use it.
But this is a problem in the real world.
For example, Python 2 is convenient for analyzing line based logs
containing some different encodings. Python 3
...deliberately makes that difficult because it is *wrong*.

Binary files containing a mixture of encodings cannot be safely
treated as text. The closest it is possible to get is to support only
ASCII compatible encodings by decoding it as ASCII with the
"surrogateescape" error handler so that bytes with the high order bit
set can be faithfully reproduced on reencoding. However, such code
will potentially fail once it encounters a non-ASCII compatible
encoding, such as UTF-16 or -32.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan@gmail.com   |   Brisbane, Australia