[Python-Dev] PEP 385: the eol-type issue

M.-A. Lemburg mal at egenix.com
Thu Aug 6 12:40:09 CEST 2009


Nick Coghlan wrote:
> Antoine Pitrou wrote:
>> M.-A. Lemburg <mal <at> egenix.com> writes:
>>> Please file a bug report for this. f.readlines() (or rather
>>> the io layer) should be using Py_UNICODE_ISLINEBREAK(ch)
>>> for detecting line break characters.
>>
>> Actually, no. It has been designed from the start to only recognize the
>> "standard" line break representations found in common formats/protocols (CR, LF
>> and CR+LF).
>> People wanting to split on arbitrary unicode line breaks should use
>> str.splitlines().
> 
> The fairly long-standing RFE relating to an arbitrarily selectable
> newline separator seems relevant here:
> http://bugs.python.org/issue1152248
> 
> As with the discussion there, the problem with using str.splitlines is
> that it prevents pipelining approaches that avoid reading a whole file
> into memory.
> 
> While removing the validity check from readlines() completely is
> questionable (the readrecords() approach mentioned in the tracker issue
> would still be better there), loosening the validity check to be based
> on Py_UNICODE_IS_LINEBREAK seems a bit more feasible. (I'd still call it
> a feature requests rather than a bug though).

I've had a look at the io implementation: this appears to be
based on the universal newline support idea which addresses
only a fixed set of "new line" character combinations and is
not as straight forward to extend to support all Unicode
line break characters as I thought.

What I don't understand is why the io layer tries to reinvent
the wheel here instead of just using the codec's .readline()
method - which *does* use .splitlines() and has full support
for all Unicode line break characters (including the CRLF
combination).

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Aug 06 2009)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::: Try our new mxODBC.Connect Python Database Interface for free ! ::::


   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/


More information about the Python-Dev mailing list