[I18n-sig] codecs module, readlines and xreadlines

Martin v. Löwis martin@v.loewis.de
17 Jan 2003 11:15:31 +0100


"M.-A. Lemburg" <mal@lemburg.com> writes:

> I'd say: let the codecs decide what to do here. 

Certainly. Unfortunately, this is not possible at the moment, since it
is already codecs.open which uses binary mode, and the codec has no
way of knowing what the original opening mode was.

> After all, codecs.open() only provide an interface to the codecs and
> leaves all the processing to them. If a codec thinks that line ends
> should all be converted to '\n' then so be it. That's also why
> codecs.open() appends an 'b' to the mode in case it is not already
> there: otherwise opening files in e.g.  UTF-16 on Windows would lose
> big.

Again: I do think that it is correct to open the underlying stream in
binary. The question is whether the codec should perform newline
translation (in addition to decoding, and probably after it).

> I think that the codecs.open() kind of treatment is more reliable
> than the open() one for text files. Simply because you always know
> what will happen [...]

This is not really true. The OP complains that you *cannot* know what
how line ends will be represented. For the builtin open, you know that
a line end will be always \n in text mode, even more so in universal
mode. As it is, the representation of a line end in the Unicode data
is platform dependent, which is bad for portability.

Regards,
Martin