Re: [Python-Dev] [python] Re: New lines, carriage returns, and Windows

Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
Excellent. While this over-simplifies the issue, let's stick to the over-simplified form, as we may be able to get somewhere. The question is independent of what the outside system believes a text file should look like, and is solely what Python believes a sequence of characters should mean. For example, does 'A\r\nB' mean that B is separated from A by one newline or two? The point is that, once we know that, we can design a translator to and from Python's conventions to any reasonable system (and, as I say, I have done it many times). But, if Python's own interpretation is ambiguous, it is a sure recipe for different translators being incompatible, even on the same system. Which is what has happened here. So, damn the outside system, EXACTLY what does Python mean by such characters, and EXACTLY what uses of them are discouraged as having unspecified meanings? If we could get an answer to that precisely enough to write a parse tree with all terminals explicit, this problem would go away. And that is all that I say can or should be done. The details of how to write the translators to other file systems are then a separate matter. Regards, Nick Maclaren, University of Cambridge Computing Service, New Museums Site, Pembroke Street, Cambridge CB2 3QH, England. Email: nmm1@cam.ac.uk Tel.: +44 1223 334761 Fax: +44 1223 334679

On 01/10/2007, Nick Maclaren <nmm1@cus.cam.ac.uk> wrote:
Python, the language, means nothing by the characters. They are bytes with defined values in a byte string (in 2.x, in 3.0 they are Unicode characters, but otherwise no difference). The *language* places no interpretation on them. Certain library functions place an interpretation on the byte values, but you need to read the function definition for that. And (a) they may not all be consistent, and (b) they may say "follows platform behaviour", but that's the way it is, so you have to live with it. Paul.

Nick Maclaren wrote:
Python's own interpretation is not ambiguous. The problem at hand is people wanting to use some random mixture of Python and .NET conventions. -- Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | Carpe post meridiem! | Christchurch, New Zealand | (I'm not a morning person.) | greg.ewing@canterbury.ac.nz +--------------------------------------+

"Nick Maclaren" <nmm1@cus.cam.ac.uk> wrote in message news:E1IcGcQ-0002Hf-JZ@virgo.cus.cam.ac.uk... | The question is independent of what the outside system believes a | text file should look like, and is solely what Python believes a | sequence of characters should mean. For example, does 'A\r\nB' | mean that B is separated from A by one newline or two? The grammar presupposes that Python code is divided into lines. Any successful interpreter must adjust to the external source's idea of line endings. This is implementation, not language definition. The grammar itself has no notion of structure within Python string objects. The split method lets one define anything as chunk separators. The builtin compile method that uses strings as code input specifies \n and only \n as a line ending. The universal line-ending model of string output to files does the same. So from either viewpoint, the unambiguous answer to your question is 'one'. Terry Jan Reedy

Does anyone else have the feeling that discussions with Mr. MacLaren don't usually bear any fruit? -- --Guido van Rossum (home page: http://www.python.org/~guido/)

On 01/10/2007, Nick Maclaren <nmm1@cus.cam.ac.uk> wrote:
Python, the language, means nothing by the characters. They are bytes with defined values in a byte string (in 2.x, in 3.0 they are Unicode characters, but otherwise no difference). The *language* places no interpretation on them. Certain library functions place an interpretation on the byte values, but you need to read the function definition for that. And (a) they may not all be consistent, and (b) they may say "follows platform behaviour", but that's the way it is, so you have to live with it. Paul.

Nick Maclaren wrote:
Python's own interpretation is not ambiguous. The problem at hand is people wanting to use some random mixture of Python and .NET conventions. -- Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | Carpe post meridiem! | Christchurch, New Zealand | (I'm not a morning person.) | greg.ewing@canterbury.ac.nz +--------------------------------------+

"Nick Maclaren" <nmm1@cus.cam.ac.uk> wrote in message news:E1IcGcQ-0002Hf-JZ@virgo.cus.cam.ac.uk... | The question is independent of what the outside system believes a | text file should look like, and is solely what Python believes a | sequence of characters should mean. For example, does 'A\r\nB' | mean that B is separated from A by one newline or two? The grammar presupposes that Python code is divided into lines. Any successful interpreter must adjust to the external source's idea of line endings. This is implementation, not language definition. The grammar itself has no notion of structure within Python string objects. The split method lets one define anything as chunk separators. The builtin compile method that uses strings as code input specifies \n and only \n as a line ending. The universal line-ending model of string output to files does the same. So from either viewpoint, the unambiguous answer to your question is 'one'. Terry Jan Reedy

Does anyone else have the feeling that discussions with Mr. MacLaren don't usually bear any fruit? -- --Guido van Rossum (home page: http://www.python.org/~guido/)
participants (5)
-
Greg Ewing
-
Guido van Rossum
-
Nick Maclaren
-
Paul Moore
-
Terry Reedy