[Python-Dev] PEP 385: the eol-type issue
M.-A. Lemburg
mal at egenix.com
Thu Aug 6 12:40:09 CEST 2009
Nick Coghlan wrote:
> Antoine Pitrou wrote:
>> M.-A. Lemburg <mal <at> egenix.com> writes:
>>> Please file a bug report for this. f.readlines() (or rather
>>> the io layer) should be using Py_UNICODE_ISLINEBREAK(ch)
>>> for detecting line break characters.
>>
>> Actually, no. It has been designed from the start to only recognize the
>> "standard" line break representations found in common formats/protocols (CR, LF
>> and CR+LF).
>> People wanting to split on arbitrary unicode line breaks should use
>> str.splitlines().
>
> The fairly long-standing RFE relating to an arbitrarily selectable
> newline separator seems relevant here:
> http://bugs.python.org/issue1152248
>
> As with the discussion there, the problem with using str.splitlines is
> that it prevents pipelining approaches that avoid reading a whole file
> into memory.
>
> While removing the validity check from readlines() completely is
> questionable (the readrecords() approach mentioned in the tracker issue
> would still be better there), loosening the validity check to be based
> on Py_UNICODE_IS_LINEBREAK seems a bit more feasible. (I'd still call it
> a feature requests rather than a bug though).
I've had a look at the io implementation: this appears to be
based on the universal newline support idea which addresses
only a fixed set of "new line" character combinations and is
not as straight forward to extend to support all Unicode
line break characters as I thought.
What I don't understand is why the io layer tries to reinvent
the wheel here instead of just using the codec's .readline()
method - which *does* use .splitlines() and has full support
for all Unicode line break characters (including the CRLF
combination).
--
Marc-Andre Lemburg
eGenix.com
Professional Python Services directly from the Source (#1, Aug 06 2009)
>>> Python/Zope Consulting and Support ... http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
________________________________________________________________________
::: Try our new mxODBC.Connect Python Database Interface for free ! ::::
eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
Registered at Amtsgericht Duesseldorf: HRB 46611
http://www.egenix.com/company/contact/
More information about the Python-Dev
mailing list