UTF-16-LE and split() under MS-Windows XP

Thu Jul 10 13:25:52 EDT 2003

Martin v. Löwis wrote:
> "Colin S. Miller" <colinsm.spam-me-not at picsel.com> writes:
> 
> 
>>Where have I gone wrong, and what is the correct method
>>to verify the BOM mark?
> 
> 
> readline is not supported in the UTF-16 codec. You have to read the
> entire file, and perform .split. Looking at the BOM should not be
> necessary, as the UTF-16 codec will do so on its own.
Is there any reason why readline() isn't supported?
AFAIK,
the prefered UNICODE standard line endings are
0x2028 (Line seperator)
0x2029 (Paragraph seperator)
but 0x10 (Line feed) and 0x13 (carriage return) are
also supported for legacy support.

I'm using
file.read().splitlines() now, but am slightly worried
about perfomance/memory when there a few hundered lines.

TIA,
Colin S. Miller

> 
> Regards,
> Martin
>