[Python-Dev] PEP 385: the eol-type issue

Thu Aug 6 12:19:38 CEST 2009

Antoine Pitrou wrote:
> M.-A. Lemburg <mal <at> egenix.com> writes:
>> Please file a bug report for this. f.readlines() (or rather
>> the io layer) should be using Py_UNICODE_ISLINEBREAK(ch)
>> for detecting line break characters.
> 
> Actually, no. It has been designed from the start to only recognize the
> "standard" line break representations found in common formats/protocols (CR, LF
> and CR+LF).
> People wanting to split on arbitrary unicode line breaks should use
> str.splitlines().

The fairly long-standing RFE relating to an arbitrarily selectable
newline separator seems relevant here:
http://bugs.python.org/issue1152248

As with the discussion there, the problem with using str.splitlines is
that it prevents pipelining approaches that avoid reading a whole file
into memory.

While removing the validity check from readlines() completely is
questionable (the readrecords() approach mentioned in the tracker issue
would still be better there), loosening the validity check to be based
on Py_UNICODE_IS_LINEBREAK seems a bit more feasible. (I'd still call it
a feature requests rather than a bug though).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------