Readlines returns non ASCII character
python at mrabarnett.plus.com
Thu Sep 24 04:02:21 CEST 2015
On 2015-09-24 02:37, Ian Kelly wrote:
> On Wed, Sep 23, 2015 at 6:09 PM, MRAB <python at mrabarnett.plus.com> wrote:
>> On 2015-09-24 00:51, paul.hermeneutic at gmail.com wrote:
>>> If this starts at the beginning of the file, then it indicates that
>>> the file is UTF-16 (LE).
>>> UTF-8[t 1] EF BB BF 239 187 191
>>> UTF-16 (BE) FE FF 254 255
>>> UTF-16 (LE) FF FE 255 254
>>> UTF-32 (BE) 00 00 FE FF 0 0 254 255
>>> UTF-32 (LE) FF FE 00 00 255 254 0 0
>> The "signature" EF BB BF indicates the encoding called "utf-8-sig" by
>> Python. It occurs on Windows.
>> If the file doesn't start with any of these, then it could be using any
>> encoding (except UTF-16 or UTF-32).
> Yes, but what does it mean when the signature is 00 FF 00 FE 00 FF and
> occurs not at the beginning but repeatedly throughout the file, as
> appears in the OP's case?
> At least, I'm assuming that the high-order bytes are 00 based on what
> the OP posted. I wouldn't be surprised though if they're just being
> mangled by the terminal, if it happens to be a certain one that will
> not be named but uses CP 1252.
Yes, a byte-string literal or a hex dump of, say, the first 256 bytes
would've been better.
More information about the Python-list