On 11 February 2012 21:24, Paul Moore
<p.f.moore@gmail.com> wrote:
What I *don't* know is what those funny bits of
mojibake I see in the text editor are.
So, do yourself and to us, "the rest of the world", a favor, and open the file in binary mode.
It used to be text when to get to non-[A-Z|a-z] text you had to have someone recording a file in a tape, pack it in the luggage, and take a plane to "overseas" to the U.S.A. . That is not the case anymore, and that, as far as I understand, is the reasoning to Python 3 to default to unicode.
Anyone can work "ignoring text" and treating bytes as bytes, opening a file in binary mode. You can use "os.linesep" instead of a hard-coded "\n" to overcome linebreaking. (Of course you might accidentally break a line inside a multi-byte character in some enconding, since you prefer to ignore them altogether, but it should be rare).
js
-><-