Read file that starts with '\xff\xfe'

Colin S. Miller colinsm.spam-me-not at picsel.com
Mon Sep 8 17:41:05 CEST 2003


Bob Gailer wrote:
> At 07:31 AM 9/8/2003, Duncan Booth wrote:
> 
>> Bob Gailer <bgailer at alum.rpi.edu> wrote in
>> news:mailman.1063025195.15280.python-list at python.org:
>>
>> > That's a good start. I presume I need to use codecs.open(filename,
>> > mode[, encoding[, errors[, buffering]]]) to read the file. What is the
>> > actual value of the "encoding[" parameter for "Little-endian UTF-16
>> > Unicode character data, with CR line terminators"
>>
>> Try:
>>
>>  myFile = codecs.open(filename, "r", "utf16")
>>
>> If the file starts with a UTF-16 marker (either little or big endian) it
>> will be read correctly. If it doesn't start with either marker reading 
>> from
>> it will throw a UnicodeError.
> 
> 
> Interesting error:
> 
> UniCodeError: UTF-16 decoding error: truncated data
Are you doing readline on the unicode file?
I bashed my head off this problem a few months ago, and ended up doing
codecs.open(...).read().splitline()

I think what happens is the codecs::readline calls the underlying 
readline code, which doesn't respect unicode, and instead splits at the 
first \r or \n it finds; in little-endian this will result in a string 
with an odd-number of bytes.

Colin Miller

> 
> Bob Gailer
> bgailer at alum.rpi.edu
> 303 442 2625
> 
> 
> ------------------------------------------------------------------------
> 
> 
> ---
> Outgoing mail is certified Virus Free.
> Checked by AVG anti-virus system (http://www.grisoft.com).
> Version: 6.0.506 / Virus Database: 303 - Release Date: 8/1/2003





More information about the Python-list mailing list