Neil Hodgson wrote:
... and because of this, the feature is already available if you use codecs.open() instead of the built-in open():
So should I not add an issue for the basic open because codecs.open should be used for this case?
Like Antoine mentioned: Using codecs.open() and .readline() is about 20-30 times slower than open().
This is mainly due to the fact that the codec's .readline() method is implemented in pure Python and does its own buffering.
IMHO, it would be a lot better to add full Unicode support for line breaks to the io layer. Given that the code for the complicated handling of the CRLF combination is already there, it's not difficult to add support for the remaing line break characters.
The implementation could reuse the Bloom filter approach used in unicodeobject.c to make this very fast.
BTW: I'm not sure why the io layer records the line endings it has seen. This makes processing more complicated for no apparent reason. In the few cases where you might need this (I don't see any), you could just as well scan the lines in a quick loop using Python.