Re: [Python-Dev] [python] Re: New lines, carriage returns, and Windows
Greg Ewing
Grrk. That's the problem. You don't get back what you have written
You do as long as you *don't* use universal newlines mode for reading. This is the best that can be done, because universal newlines are inherently ambiguous.
I don't know PRECISELY what you mean by "universal newlines mode", and this issue is all about the details, so any response would merely enhance the confusion.
If you want universal newlines, you just have to accept that you can't also have \r characters meaning something other than newlines in your files. This is true regardless of what programming language or I/O model is being used.
No, that is not true, and I have used more than one model where it wasn't. Let's stick to models where newlines are special characters - I prefer the ones where they are not, but that is by the way. Model 1: certain characters can be used only in combination. E.g. \f must occur immediately before (or after) a \n, which it modifies. r is either a newline-with-overprint or must be associated with a \n. In both cases, only ONE of the alternatives is permitted in the chosen model - the other use then becomes an error (and raises an exception). Model 2: (BCPL) there are a variety of newline characters, \n for plain newline, \f for newline-with-form-feed and \r for newline- with-overprint. ALL cause a newline, with the associated property. Note that the above is what the program sees - what is written to the outside world and how input is read is another matter. But I can assure you, from my own and many other people's experience, that neither of the above models cause the confusion being shown by the postings in this thread. Regards, Nick Maclaren, University of Cambridge Computing Service, New Museums Site, Pembroke Street, Cambridge CB2 3QH, England. Email: nmm1@cam.ac.uk Tel.: +44 1223 334761 Fax: +44 1223 334679
Nick Maclaren wrote:
I don't know PRECISELY what you mean by "universal newlines mode"
I mean precisely what Python means by the term: any of "\r", "\n" or "\r\n" represent a newline, and no distinction is made between them. You only need to use that if you don't know what convention is being used by the file you're reading. And if you don't know that, you've already lost information about what the contents of the file means, and there's nothing that any I/O system can do to get it back.
Model 1: certain characters can be used only in combination. ...
That's all fine if you know the file adheres to those conventions. Just open it in binary mode and go for it. The I/O systems of C and/or Python are designed for environments where the files *don't* adhere to conventions as helpful as that. They're making the best of what they're given.
Note that the above is what the program sees - what is written to the outside world and how input is read is another matter.
But I can assure you, from my own and many other people's experience, that neither of the above models cause the confusion being shown by the postings in this thread.
There's no confusion about how newlines are represented *inside* a Python program. The convention is quite clear - a newline is "\n" and only "\n". Confusion only arises when people try to process strings internally that don't adhere to that convention. -- Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | Carpe post meridiem! | Christchurch, New Zealand | (I'm not a morning person.) | greg.ewing@canterbury.ac.nz +--------------------------------------+
participants (2)
-
Greg Ewing
-
Nick Maclaren