On 11/08/07, Guido van Rossum <guido@python.org> wrote:
On 8/11/07, Tony Lownds <tony@pagedna.com> wrote:
Is this ok: when newline='\r\n' or newline='\r' is passed, only that string is used to determine the end of lines. No translation to '\n' is done.
I *think* it would be more useful if it always returned lines ending in \n (not \r\n or \r). Wouldn't it? Although this is not how it currently behaves; when you set newline='\r\n', it returns the \r\n unchanged, so it would make sense to do this too when newline='\r'. Caveat user I guess.
Neither this wording, nor the PEP are clear to me, but I'm assuming/hoping that there will be a way to spell the current behaviour for universal newlines on input[1], namely that files can have *either* bare \n, *or* the combination \r\n, to delimit lines. Whichever is used (I have no need for mixed-style files) gets translated to \n so that the program sees the same data regardless. [1] ... at least the bit I care about :-) This behaviour is immensely useful for uniform treatment of Windows text files, which are an inconsistent mess of \n-only and \r\n conventions. Specifically, I'm looking to replicate this behaviour:
xxd crlf 0000000: 610d 0a62 0d0a a..b..
xxd lf 0000000: 610a 620a a.b.
python Python 2.5.1 (r251:54863, Apr 18 2007, 08:51:08) [MSC v.1310 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information.
open('crlf').read() 'a\nb\n' open('lf').read() 'a\nb\n'
As demonstrated, this is the default in Python 2.5. I'd hope it was so in 3.0 as well. Sorry I can't test this for myself - I don't have the time/toolset to build my own Py3k on Windows... Paul.