detecting newline character
Thomas 'PointedEars' Lahn
PointedEars at web.de
Sat Apr 23 19:30:29 EDT 2011
Daniel Geržo wrote:
> Thomas 'PointedEars' Lahn wrote:
>> Chris Rebert wrote:
>>> Daniel Geržo wrote:
>>>> [f.newlines is None after f.readlines()
>>>> when f = codecs.open(…, mode='rU', encoding='ascii'),
>>>> but not when f = codecs.open(…, mode='rU')]
>>>
>>> […]
>>> I would speculate that the upshot of this is that codecs.open() ends
>>> up calling built-in open() with a nonsense `mode` of "rUb" or similar,
>>> resulting in strange behavior.
>>>
>>> If this explanation is correct, then there are 2 bugs:
>>> 1. Built-in open() should treat "b" and "U" as mutually exclusive and
>>> reject mode strings which involve both.
>>> 2. codecs.open() should either reject modes involving "U", or be fixed
>>> so that they work as expected.
>>
>> You might be correct that it is a bug (already fixed in versions newer
>> than 2.5), since codecs.open() from my Python 2.6 reads as follows:
>
> Well I am doing this on:
> Python 2.7.1 (r271:86832, Mar 7 2011, 14:28:09)
> [GCC 4.2.1 (Apple Inc. build 5664)] on darwin
>
> So what do you guys advise me to do?
RTSL, fix when necessary (see my other follow-up), check the trunk, and if
necessary submit a patch.
For an immediate solution, do not do what is not supposed to work (calling
codecs.open(…, mode='U')). You can find the three kinds of newlines in the
text with, e.g.
self.newline = list(
set(re.findall(r'\r?\n|\r', ''.join(fobj.readlines()))))
Please trim your quotes to the relevant minimum (see above for example).
--
PointedEars
More information about the Python-list
mailing list