Antoine Pitrou wrote:
M.-A. Lemburg <mal <at> egenix.com> writes:
Please file a bug report for this. f.readlines() (or rather the io layer) should be using Py_UNICODE_ISLINEBREAK(ch) for detecting line break characters.
Actually, no. It has been designed from the start to only recognize the "standard" line break representations found in common formats/protocols (CR, LF and CR+LF). People wanting to split on arbitrary unicode line breaks should use str.splitlines().
The fairly long-standing RFE relating to an arbitrarily selectable newline separator seems relevant here: http://bugs.python.org/issue1152248
As with the discussion there, the problem with using str.splitlines is that it prevents pipelining approaches that avoid reading a whole file into memory.
While removing the validity check from readlines() completely is questionable (the readrecords() approach mentioned in the tracker issue would still be better there), loosening the validity check to be based on Py_UNICODE_IS_LINEBREAK seems a bit more feasible. (I'd still call it a feature requests rather than a bug though).