[Python-Dev] file() vs open(), round 7

"Martin v. Löwis" martin at v.loewis.de
Tue Dec 27 18:54:30 CET 2005


M.-A. Lemburg wrote:
>>Here's a rough draft:
>>
>>    def textopen(name, mode="r", encoding=None):
>>        if "U" not in mode:
>>            mode += "U"
> 
> 
> The "U" is not needed when opening files using codecs -
> these always break lines using .splitlines() which
> breaks lines according to the Unicode rules and also
> knows about the various line break variants on different
> platforms.

Still, codecs typically don't implement universal newlines
correctly. If you specify 'U', then do .read(), you deserve
to get \n (U+0010) as the line separator; with most codecs,
you get whatever line breaks where in the file.

Passing 'U' to the underlying stream is wrong, as well:
if the stream is double-byte oriented (e.g. UTF-16),
the 'U' filtering will rarely do anything, but if it does
something, it will be wrong.

I agree that it would be desirable to have textopen always
default to universal newlines, however, this is difficult
to implement.

Regards,
Martin


More information about the Python-Dev mailing list