New lines, carriage returns, and Windows
We ran into an interesting user-reported issue w/ IronPython and the way Python writes to files and I thought I'd get python-dev's opinion. When writing a string in text mode that contains \r\n we both write \r\r\n because the default write mode is to replace \n with \r\n. This works great as long as you stay within an entirely Python world. Because Python uses \n for everything internally you'll never end up writing out a \r\n that gets transformed into a \r\r\n. But when interoperating with other native code (or .NET code in our case) it's fairly easy to be exposed to a string which contains \r\n. Ultimately we see odd behavior when round tripping the contents of a multi-line text box through a file. So today users have to be aware of the fact that Python internally always uses \n. They also need to be aware of any APIs that they call that might return a string with an embedded \r\n inside of them and transform the string back into the Python version. It could be argued that there's little value in doing the simple transformation from \r\n -> \r\r\n. Ultimately that creates a file that has line endings which aren't good on any platform. On the other hand it could also be argued that Python defines new-lines as \n and there should be no deviation from that. And doing so could be considered a slippery slope, first file deals with it, and next the standard libraries, etc... Finally this might break some apps and if we changed IronPython to behave differently we could introduce incompatibilities which we don't want. So I'm curious: Is there a reason this behavior is useful that I'm missing? Would there be a possibility (or objections to) making \r\n be untransformed in the Py3k timeframe? Or should we just tell our users to open files in binary mode? :)
Dino Viehland wrote:
We ran into an interesting user-reported issue w/ IronPython and the way Python writes to files and I thought I'd get python-dev's opinion.
When writing a string in text mode that contains \r\n we both write \r\r\n because the default write mode is to replace \n with \r\n. This works great as long as you stay within an entirely Python world. Because Python uses \n for everything internally you'll never end up writing out a \r\n that gets transformed into a \r\r\n. But when interoperating with other native code (or .NET code in our case) it's fairly easy to be exposed to a string which contains \r\n. Ultimately we see odd behavior when round tripping the contents of a multi-line text box through a file.
So today users have to be aware of the fact that Python internally always uses \n. They also need to be aware of any APIs that they call that might return a string with an embedded \r\n inside of them and transform the string back into the Python version.
It could be argued that there's little value in doing the simple transformation from \r\n -> \r\r\n. Ultimately that creates a file that has line endings which aren't good on any platform. On the other hand it could also be argued that Python defines new-lines as \n and there should be no deviation from that. And doing so could be considered a slippery slope, first file deals with it, and next the standard libraries, etc... Finally this might break some apps and if we changed IronPython to behave differently we could introduce incompatibilities which we don't want.
So I'm curious: Is there a reason this behavior is useful that I'm missing? Would there be a possibility (or objections to) making \r\n be untransformed in the Py3k timeframe? Or should we just tell our users to open files in binary mode? :)
It is normal when working with Windows interaction in the Python world to be aware that you might receive strings with '\r\n' in and do the conversion yourself. We come across this a great deal with Resolver (when working with multi line text boxes for example) and quite happily replace '\r\n' with '\n' and vice versa as needed. As a developer who uses both Python and IronPython I say that this isn't a problem. I may be wrong or outvoted of course... Michael http://www.manning.com/foord
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.u...
This works great as long as you stay within an entirely Python world. Because Python uses \n for everything internally
I think you misunderstand fairly significantly how this all works together. Python does not use \n "for everything internally". Python is well capable of representing \r separately, and does so if you ask it to.
So I'm curious: Is there a reason this behavior is useful that I'm missing?
I think you are missing how it works in the first place (or else you failed to communicate to me what precise behavior you find puzzling). Regards, Martin
My understanding is that users can write code that uses only \n and Python will write the end-of-line character(s) that are appropriate for the platform when writing to a file. That's what I meant by uses \n for everything internally. But if you write \r\n to a file Python completely ignores the presence of the \r and transforms the \n into a \r\n anyway, hence the \r\r in the resulting stream. My last question is simply does anyone find writing \r\r\n when the original string contained \r\n a useful behavior - personally I don't see how it is. But Guido's response makes this sound like it's a problem w/ VC++ stdio implementation and not something that Python is explicitly doing. Anyway, it'd might be useful to have a text-mode file that you can write \r\n to and only get \r\n in the resulting file. But if the general sentiment is s.replace('\r', '') is the way to go we can advice our users of the behavior when interoperating w/ APIs that return \r\n in strings. -----Original Message----- From: "Martin v. Löwis" [mailto:martin@v.loewis.de] Sent: Wednesday, September 26, 2007 3:01 PM To: Dino Viehland Cc: python-dev@python.org Subject: Re: [Python-Dev] New lines, carriage returns, and Windows
This works great as long as you stay within an entirely Python world. Because Python uses \n for everything internally
I think you misunderstand fairly significantly how this all works together. Python does not use \n "for everything internally". Python is well capable of representing \r separately, and does so if you ask it to.
So I'm curious: Is there a reason this behavior is useful that I'm missing?
I think you are missing how it works in the first place (or else you failed to communicate to me what precise behavior you find puzzling). Regards, Martin
Dino Viehland wrote:
My understanding is that users can write code that uses only \n and Python will write the end-of-line character(s) that are appropriate for the platform when writing to a file. That's what I meant by uses \n for everything internally.
But if you write \r\n to a file Python completely ignores the presence of the \r and transforms the \n into a \r\n anyway, hence the \r\r in the resulting stream. My last question is simply does anyone find writing \r\r\n when the original string contained \r\n a useful behavior - personally I don't see how it is.
But Guido's response makes this sound like it's a problem w/ VC++ stdio implementation and not something that Python is explicitly doing. Anyway, it'd might be useful to have a text-mode file that you can write \r\n to and only get \r\n in the resulting file. But if the general sentiment is s.replace('\r', '') is the way to go we can advice our users of the behavior when interoperating w/ APIs that return \r\n in strings.
We always do replace('\r\n','\n') but same difference... Michael
-----Original Message----- From: "Martin v. Löwis" [mailto:martin@v.loewis.de] Sent: Wednesday, September 26, 2007 3:01 PM To: Dino Viehland Cc: python-dev@python.org Subject: Re: [Python-Dev] New lines, carriage returns, and Windows
This works great as long as you stay within an entirely Python world. Because Python uses \n for everything internally
I think you misunderstand fairly significantly how this all works together. Python does not use \n "for everything internally". Python is well capable of representing \r separately, and does so if you ask it to.
So I'm curious: Is there a reason this behavior is useful that I'm missing?
I think you are missing how it works in the first place (or else you failed to communicate to me what precise behavior you find puzzling).
Regards, Martin
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.u...
And if this is fine for you, given that you may have the largest WinForms / IronPython code base, I tend to think the replace may be reasonable. But we have had someone get surprised by this behavior. -----Original Message----- From: Michael Foord [mailto:fuzzyman@voidspace.org.uk] Sent: Wednesday, September 26, 2007 3:15 PM To: Dino Viehland Cc: python-dev@python.org Subject: Re: [python] Re: [Python-Dev] New lines, carriage returns, and Windows Dino Viehland wrote:
My understanding is that users can write code that uses only \n and Python will write the end-of-line character(s) that are appropriate for the platform when writing to a file. That's what I meant by uses \n for everything internally.
But if you write \r\n to a file Python completely ignores the presence of the \r and transforms the \n into a \r\n anyway, hence the \r\r in the resulting stream. My last question is simply does anyone find writing \r\r\n when the original string contained \r\n a useful behavior - personally I don't see how it is.
But Guido's response makes this sound like it's a problem w/ VC++ stdio implementation and not something that Python is explicitly doing. Anyway, it'd might be useful to have a text-mode file that you can write \r\n to and only get \r\n in the resulting file. But if the general sentiment is s.replace('\r', '') is the way to go we can advice our users of the behavior when interoperating w/ APIs that return \r\n in strings.
We always do replace('\r\n','\n') but same difference... Michael
-----Original Message----- From: "Martin v. Löwis" [mailto:martin@v.loewis.de] Sent: Wednesday, September 26, 2007 3:01 PM To: Dino Viehland Cc: python-dev@python.org Subject: Re: [Python-Dev] New lines, carriage returns, and Windows
This works great as long as you stay within an entirely Python world. Because Python uses \n for everything internally
I think you misunderstand fairly significantly how this all works together. Python does not use \n "for everything internally". Python is well capable of representing \r separately, and does so if you ask it to.
So I'm curious: Is there a reason this behavior is useful that I'm missing?
I think you are missing how it works in the first place (or else you failed to communicate to me what precise behavior you find puzzling).
Regards, Martin
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.u...
Dino Viehland wrote:
And if this is fine for you, given that you may have the largest WinForms / IronPython code base, I tend to think the replace may be reasonable. But we have had someone get surprised by this behavior.
It is a slight impedance mismatch between Python and Windows - but isn't restricted to IronPython, so changing Python semantics doesn't seem like the right answer. Alternatively a more intelligent text mode (that writes '\n' as '\r\n' and '\r\n' as '\r\n' on Windows) doesn't sound like *such* a bad idea - but you will still get caught out by this. A string read in text mode will read '\r\n' as '\n'. Setting this on a winforms component will still do the wrong thing. Better to be aware of the difference and use binary mode. Michael
-----Original Message----- From: Michael Foord [mailto:fuzzyman@voidspace.org.uk] Sent: Wednesday, September 26, 2007 3:15 PM To: Dino Viehland Cc: python-dev@python.org Subject: Re: [python] Re: [Python-Dev] New lines, carriage returns, and Windows
Dino Viehland wrote:
My understanding is that users can write code that uses only \n and Python will write the end-of-line character(s) that are appropriate for the platform when writing to a file. That's what I meant by uses \n for everything internally.
But if you write \r\n to a file Python completely ignores the presence of the \r and transforms the \n into a \r\n anyway, hence the \r\r in the resulting stream. My last question is simply does anyone find writing \r\r\n when the original string contained \r\n a useful behavior - personally I don't see how it is.
But Guido's response makes this sound like it's a problem w/ VC++ stdio implementation and not something that Python is explicitly doing. Anyway, it'd might be useful to have a text-mode file that you can write \r\n to and only get \r\n in the resulting file. But if the general sentiment is s.replace('\r', '') is the way to go we can advice our users of the behavior when interoperating w/ APIs that return \r\n in strings.
We always do replace('\r\n','\n') but same difference...
Michael
-----Original Message----- From: "Martin v. Löwis" [mailto:martin@v.loewis.de] Sent: Wednesday, September 26, 2007 3:01 PM To: Dino Viehland Cc: python-dev@python.org Subject: Re: [Python-Dev] New lines, carriage returns, and Windows
This works great as long as you stay within an entirely Python world. Because Python uses \n for everything internally
I think you misunderstand fairly significantly how this all works together. Python does not use \n "for everything internally". Python is well capable of representing \r separately, and does so if you ask it to.
So I'm curious: Is there a reason this behavior is useful that I'm missing?
I think you are missing how it works in the first place (or else you failed to communicate to me what precise behavior you find puzzling).
Regards, Martin
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.u...
My understanding is that users can write code that uses only \n and Python will write the end-of-line character(s) that are appropriate for the platform when writing to a file. That's what I meant by uses \n for everything internally.
Here you misunderstand - that's only the case when the file is opened in text mode. If the file is opened in binary mode, and you write \n, then it writes just a single byte (0xA).
But if you write \r\n to a file Python completely ignores the presence of the \r and transforms the \n into a \r\n anyway, hence the \r\r in the resulting stream. My last question is simply does anyone find writing \r\r\n when the original string contained \r\n a useful behavior - personally I don't see how it is.
That's just for consistency.
But Guido's response makes this sound like it's a problem w/ VC++ stdio implementation and not something that Python is explicitly doing.
That's correct - it's the notion of "text mode" for file IO.
Anyway, it'd might be useful to have a text-mode file that you can write \r\n to and only get \r\n in the resulting file.
This I don't understand. Why don't you just use binary mode then? At least for Python 2.x, the *only* difference between text and binary mode is the treatment of line endings. For Python 3, things will be different as the distinction goes further; the precise API for line endings is still being considered there. Regards, Martin
This I don't understand. Why don't you just use binary mode then? At least for Python 2.x, the *only* difference between text and binary mode is the treatment of line endings.
That just flips the problem to the other side. Now if I have a Python library that I'm mixing w/ .NET code I need to be sure to transform the line endings, but now from \n -> \r\n (and hopefully they'd detect the new-line style they should use so it works correctly on Mono on *nix or Silverlight on OS X as well).
Dino Viehland wrote:
Why don't you just use binary mode then?
That just flips the problem to the other side.
Seems to me that IronPython really needs two string types, "Python string" and ".NET string", with automatic conversion when crossing a boundary between Python code and .NET code. -- Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | Carpe post meridiem! | Christchurch, New Zealand | (I'm not a morning person.) | greg.ewing@canterbury.ac.nz +--------------------------------------+
On 26/09/2007, Dino Viehland
My understanding is that users can write code that uses only \n and Python will write the end-of-line character(s) that are appropriate for the platform when writing to a file. That's what I meant by uses \n for everything internally.
OK, so far so good - although I'm not *quite* sure there's a self-consistent definition of "code that only uses \n". I'll assume you mean code that has a concept of lines, that lines never contain anything other than text (specifically, neither \r or \n can appear in a line, I'll punt on whether other weird stuff like form feed are legal), and that whenever your code needs to write data to a file, it writes lines with \n alone between them.
But if you write \r\n to a file Python completely ignores the presence of the \r and transforms the \n into a \r\n anyway, hence the \r\r in the resulting stream. My last question is simply does anyone find writing \r\r\n when the original string contained \r\n a useful behavior - personally I don't see how it is.
In the above model, lines can't contain \r and between lines you only ever write \n - so where did the \r\n come from? If you receive what you think are lines from an outside source, and they contain \r, then you didn't sanity check your data. If you receive a block of raw (effectively binary!) data which you want to translate into your model, it's up to you how you cut it up into lines. If you read data using one of Python's text modes, it's up to you to understand how it works.
But Guido's response makes this sound like it's a problem w/ VC++ stdio implementation and not something that Python is explicitly doing.
I'm not sure it's a CRT issue. Certainly the \r\n vs \n confusion comes from the CRT - the underlying OS (just like Unix!!!!) only deals in files as streams of bytes. But ultimately, "lines" are an abstraction in your code. All the CRT (and Python) do is help (or maybe hinder) you with the "normal" cases.
Anyway, it'd might be useful to have a text-mode file that you can write \r\n to and only get \r\n in the resulting file.
I can't comment on that, other than to say that if you better defined the semantic model (lines, how things are encoded/decoded to files, etc, somewhat like I tried to above) it would be more obvious what use case this was trying to address.
But if the general sentiment is s.replace('\r', '') is the way to go we can advice our users of the behavior when interoperating w/ APIs that return \r\n in strings.
I'd say users of the relevant APIs need to understand how the APIs represent "lines", so that they can convert the received data to their program's model of lines. Of course, that probably corresponds to something like s.replace('\r','') or likely more correctly data_lines = s.split('\r\n'). A "rule of thumb" that doesn't make it clear that the concept of "line" has 2 different binary representations in 2 different areas (data back from APIs vs data from files) is likely to ultimately lead to mistakes and confusion. If you think this is bad, wait until you have to deal with Unicode issues like what *encoding* the data is being supplied to you in. Makes guessing newline conventions seem simple (at least to this parochial English-speaker :-)) Although as this is IronPython, you may already have that covered... Paul. PS In real life, you often just want a cheap and cheerful answer. For that, "strip out spurious \r characters" may be fine.
On 9/26/07, Dino Viehland
We ran into an interesting user-reported issue w/ IronPython and the way Python writes to files and I thought I'd get python-dev's opinion.
When writing a string in text mode that contains \r\n we both write \r\r\n because the default write mode is to replace \n with \r\n. This works great as long as you stay within an entirely Python world. Because Python uses \n for everything internally you'll never end up writing out a \r\n that gets transformed into a \r\r\n. But when interoperating with other native code (or .NET code in our case) it's fairly easy to be exposed to a string which contains \r\n. Ultimately we see odd behavior when round tripping the contents of a multi-line text box through a file.
So today users have to be aware of the fact that Python internally always uses \n. They also need to be aware of any APIs that they call that might return a string with an embedded \r\n inside of them and transform the string back into the Python version.
It could be argued that there's little value in doing the simple transformation from \r\n -> \r\r\n. Ultimately that creates a file that has line endings which aren't good on any platform. On the other hand it could also be argued that Python defines new-lines as \n and there should be no deviation from that. And doing so could be considered a slippery slope, first file deals with it, and next the standard libraries, etc... Finally this might break some apps and if we changed IronPython to behave differently we could introduce incompatibilities which we don't want.
So I'm curious: Is there a reason this behavior is useful that I'm missing?
No, it is simply the way Microsoft's C stdio library works. :-( A simple workaround would be to apply s.replace('\r', '') before writing anything of course.
Would there be a possibility (or objections to) making \r\n be untransformed in the Py3k timeframe? Or should we just tell our users to open files in binary mode? :)
Py3k supports a number of different ways of working with newlines for text files, but not (yet) one that leaves \r\n alone while translating a lone \n into \r\n. It may not be too late to change that though. -- --Guido van Rossum (home page: http://www.python.org/~guido/)
Dino Viehland wrote:
When writing a string in text mode that contains \r\n we both write \r\r\n
Maybe there should be a universal newlines mode defined for output as well as input, which translates any of "\r", "\n" or "\r\n" into the platform line ending. Although I suspect that a string containing "\r\n" is going to cause more problems for Python applications than this. E.g. consider what happens when you try to split a string on newlines. -- Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | Carpe post meridiem! | Christchurch, New Zealand | (I'm not a morning person.) | greg.ewing@canterbury.ac.nz +--------------------------------------+
Greg> Maybe there should be a universal newlines mode defined for output Greg> as well as input, which translates any of "\r", "\n" or "\r\n" Greg> into the platform line ending. I thought that's the way it was supposed to work, but it clearly doesn't: >>> f = open("test.txt", "wt") >>> f.write("a\rb\rnc\n") 7 >>> f.close() >>> open("test.txt", "rb").read() b'a\rb\rnc\n' I'd be open to such a change. Principle of least surprise? Skip
[moving to python-3000]
The symmetry isn't as strong as you suggest, but I agree it would be a
useful feature. Would you mind filing a Py3k feature request so we
don't forget?
A proposal for an API given the existing newlines=... parameter
(described in detail in PEP 3116) would be even better.
And a patch would be best, but you know that. :-)
--Guido
On 9/26/07, skip@pobox.com
Greg> Maybe there should be a universal newlines mode defined for output Greg> as well as input, which translates any of "\r", "\n" or "\r\n" Greg> into the platform line ending.
I thought that's the way it was supposed to work, but it clearly doesn't:
>>> f = open("test.txt", "wt") >>> f.write("a\rb\rnc\n") 7 >>> f.close() >>> open("test.txt", "rb").read() b'a\rb\rnc\n'
I'd be open to such a change. Principle of least surprise?
Skip _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org
-- --Guido van Rossum (home page: http://www.python.org/~guido/)
Greg> Maybe there should be a universal newlines mode defined for output Greg> as well as input, which translates any of "\r", "\n" or "\r\n" Greg> into the platform line ending. Skip> I'd be open to such a change. Principle of least surprise? Guido> The symmetry isn't as strong as you suggest, but I agree it would Guido> be a useful feature. Would you mind filing a Py3k feature request Guido> so we don't forget? Guido> A proposal for an API given the existing newlines=... parameter Guido> (described in detail in PEP 3116) would be even better. I've been thinking about this some more (in lieu of actually writing up any sort of proposal ;-) and I'm not so sure it would be all that useful. If you've opened a file in text mode you should only be writing newlines as '\n' anyway. If you want to translate a text file imported from another system to use the current system's line ending just open both the input and output files in text mode. With universal newlines mode for output, should writing '\r\n' result in one or two newlines (or one-and-a-half)? Depending on the platform you can argue that it should write out '\r\r', '\r\n\r\n' or '\n\n' or if on Windows that it should be left alone as '\r\n'. There is, of course, the current '\r\r\n' behavior as well. I don't think there's obviously one best answer. If you want to do something esoteric, open the file in binary mode and do whatever you like. Skip
skip@pobox.com wrote:
I've been thinking about this some more (in lieu of actually writing up any sort of proposal ;-) and I'm not so sure it would be all that useful.
Yes, despite being the one who suggested it, I've come to the same conclusion myself. The problem should really be addressed at the source, which is the Python/.NET boundary. Anything else would just lead to ambiguity. So I'm voting -1 on my own proposal here. -- Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | Carpe post meridiem! | Christchurch, New Zealand | (I'm not a morning person.) | greg.ewing@canterbury.ac.nz +--------------------------------------+
participants (7)
-
"Martin v. Löwis"
-
Dino Viehland
-
Greg Ewing
-
Guido van Rossum
-
Michael Foord
-
Paul Moore
-
skip@pobox.com