UTF-16-LE and split() under MS-Windows XP

Martin v. Löwis martin at v.loewis.de
Thu Jul 10 22:04:17 CEST 2003


"Colin S. Miller" <colinsm.spam-me-not at picsel.com> writes:

> Is there any reason why readline() isn't supported?

Because it hasn't been implemented. The naive approach of calling the
readline of the underlying stream (as all other codecs do) does not
work for UTF-16.

> AFAIK, the prefered UNICODE standard line endings are 0x2028 (Line
> seperator) 0x2029 (Paragraph seperator) but 0x10 (Line feed) and
> 0x13 (carriage return) are also supported for legacy support.

Add that on top of that. One should support all line breaking
characters for UTF-16, atleast in Universal Newline (U) mode.

> I'm using file.read().splitlines() now, but am slightly worried
> about perfomance/memory when there a few hundered lines.

Feel free to implement and contribute a patch. It has been that way
for some years now, and it likely will stay the same for the coming
years unless somebody contributes a patch.

Regards,
Martin





More information about the Python-list mailing list