detecting newline character

Sat Apr 23 19:30:29 EDT 2011

Daniel Geržo wrote:

> Thomas 'PointedEars' Lahn wrote:
>> Chris Rebert wrote:
>>> Daniel Geržo wrote:
>>>> [f.newlines is None after f.readlines()
>>>>  when f = codecs.open(…, mode='rU', encoding='ascii'),
>>>>  but not when f = codecs.open(…, mode='rU')]
>>>
>>> […]
>>> I would speculate that the upshot of this is that codecs.open() ends
>>> up calling built-in open() with a nonsense `mode` of "rUb" or similar,
>>> resulting in strange behavior.
>>>
>>> If this explanation is correct, then there are 2 bugs:
>>> 1. Built-in open() should treat "b" and "U" as mutually exclusive and
>>> reject mode strings which involve both.
>>> 2. codecs.open() should either reject modes involving "U", or be fixed
>>> so that they work as expected.
>>
>> You might be correct that it is a bug (already fixed in versions newer
>> than 2.5), since codecs.open() from my Python 2.6 reads as follows:
> 
> Well I am doing this on:
> Python 2.7.1 (r271:86832, Mar  7 2011, 14:28:09)
> [GCC 4.2.1 (Apple Inc. build 5664)] on darwin
> 
> So what do you guys advise me to do?

RTSL, fix when necessary (see my other follow-up), check the trunk, and if 
necessary submit a patch. 

For an immediate solution, do not do what is not supposed to work (calling 
codecs.open(…, mode='U')).  You can find the three kinds of newlines in the 
text with, e.g.

  self.newline = list(
    set(re.findall(r'\r?\n|\r', ''.join(fobj.readlines()))))

Please trim your quotes to the relevant minimum (see above for example).

-- 
PointedEars