Re: [Python-Dev] New lines, carriage returns, and Windows

"Paul Moore" <p.f.moore@gmail.com> wrote:
I won't. There are a few of us still left who know how this started, and here is a simplified description. Unix was a computer scientist's workbench, and made no attempt to be general. In particular, its text datastream model was appropriate for the imnportant devices of the day - teletypes and similar. So far, so good. But what was forgotten later is that the model does NOT extend to other systems and, in particular, made no sense on the record-oriented models generally used by mainframes (see Fortran for an example). When C was standardised, this was fudged. I tried to get it improved, but it is one of the many things I failed to do. The handling of ALL of the control characters in text I/O is non-portable (even \t, despite what the satndard says), and you have to follow the system's constraints if things are to work. Unfortunately, the kludging that the compiler does to map C to the operating system confuses things still further - though it is essential. Now, BCPL was an ancestor of C, but always was a more portable language (i.e. it didn't start with a specific operating system in mind), and used/uses a rather better model. In this, line separators are atomic - e.g. '\f' is newline-with-form-feed and '\r' is "newline-with-overprinting". Now, THAT model is more generic. Not fully generic, of course, but it would cater for all of Unix, CPM and its derivatives (yes, Microsoft), MacOS and most mainframes (with some reservations). So, until and unless Python chooses to define its own I/O model, these problems will continue to arise. Whether this one is a simple bug or an avoidable feature, I can't say without looking harder, but bugs are often caused by attempting to implement impossible or confusing specifications. Regards, Nick Maclaren, University of Cambridge Computing Service, New Museums Site, Pembroke Street, Cambridge CB2 3QH, England. Email: nmm1@cam.ac.uk Tel.: +44 1223 334761 Fax: +44 1223 334679

On 9/29/07, Nick Maclaren <nmm1@cus.cam.ac.uk> wrote:
Have you looked at Py3k at all, especially PEP 3116 (new I/O)? Python *does* have its own I/O model. There are binary files and text files. For binary files, you write bytes and the semantic model is that of an array of bytes; byte indices are seek positions. For text files, the contents is considered to be Unicode, encoded as bytes in a binary file. So text file always has an underlying binary file. Two translations take place, both of which have defaults varying by platform. One translation is encoding Unicode text into bytes upon output, and decoding bytes to Unicode text upon input. This can use any encoding supported by the encodings package. The other translation deals with line endings. Upon input, any of \r\n, \r, or \n is translated to a single \n by default (this is nhe "universal newlines" algorithm from Python 2.x). This can be tweaked or disabled. Upon output, \n is translated into a platform specific string chosen from \r\n, \r, or \n. This can also be disabled or overridden. Note that \r, when written, is never treated specially; if you want special processing for \r on output, you can write your own translation layer. That's all. There is nothing unimplementable or confusing in these specifications. Python doesn't care about record I/O on legacy OSes; it does care about variability found in practice between popular OSes. Note that \r, \n and friends in Python 3000 are either ASCII (in bytes literals) or Unicode (in text literals). Again, no support for legacy systems that don't use ASCII or a superset. Legacy OSes are called that for a reason. -- --Guido van Rossum (home page: http://www.python.org/~guido/)

Guido van Rossum wrote:
So the question is, that when a string containing '\r\n' is written to a file in text mode on a Windows platform, should it be written with the encoded representation of '\r\n' or '\r\r\n'? Purity would dictate the latter and practicality the former (IMO)... However, that would mean that round tripping a string would change it ('\r\n' would be written as '\r\n' and then read as '\n') - on the other hand (particularly given that we are treating the data as text and not a binary blob) I don't see how writing '\r\r\n' would ever actually be useful in text. +1 on just writing '\r\n' from me. Michael Foord http://www.manning.com/foord

"Michael Foord" <fuzzyman@voidspace.org.uk> wrote in message news:46FE6F92.40601@voidspace.org.uk... | Guido van Rossum wrote: [snip first part of nice summary of Python i/o model] | > The other translation deals with line endings. Upon input, any of | > \r\n, \r, or \n is translated to a single \n by default (this is nhe [sic] | > "universal newlines" algorithm from Python 2.x). This can be tweaked | > or disabled. Upon output, \n is translated into a platform specific | > string chosen from \r\n, \r, or \n. This can also be disabled or | > overridden. Note that \r, when written, is never treated specially; if | > you want special processing for \r on output, you can write your own | > translation layer. | So the question is, that when a string containing '\r\n' is written to a | file in text mode on a Windows platform, should it be written with the | encoded representation of '\r\n' or '\r\r\n'? I think Guido pretty clearly said that on output, the default behavior is that \r is nothing special. If you want a special case exception, write a special case translator. +1 from me. To propose otherwise is to propose that the default semantic meaning of Python text objects depend on the platform that it might be output-translated for. I believe the point of universal newline support was to get away from this. | Purity would dictate the latter and practicality the former (IMO)... I disagree. Special case exceptions complicate both learnability and code readability and maintainability. Simplicity is practicality. The symmetry of 'platform-line-endings =input> \n =output> plaform-line-endings' is both pure and practical. | However, that would mean that round tripping a string would change it | ('\r\n' would be written as '\r\n' and then read as '\n') Whereas \r\r\n would be read back as \r\n, which is what should happen. Round-trip-ability is practical to me. | - on the other | hand (particularly given that we are treating the data as text and not a | binary blob) I don't see how writing '\r\r\n' would ever actually be | useful in text. There are two normal ways for internal Python text to have \r\n: 1. Read from a file with \r\r\n. Then \r\r\n is correct output (on the same platform). 2. Intentially put there by a programmer. If s/he also chooses default \n translation on output, \r<translation of \n> is correct. The leaves 1. Bugs due to ignorance or accident. These should be repaired. 2. Other special situations, which can be handled by disabling, overriding, and layering the defaults. This seems enough flexibility to me. Terry Jan Reedy

Terry Reedy wrote:
Actually, I usually get these strings from Windows UI components. A file containing '\r\n' is read in with '\r\n' being translated to '\n'. New user input is added containing '\r\n' line endings. The file is written out and now contains a mix of '\r\n' and '\r\r\n'. Michael

On 9/29/07, Michael Foord <fuzzyman@voidspace.org.uk> wrote:
Out of curiosity, why don't the Python wrappers for your Windows UI components do the appropriate '\r\n' -> '\n' conversions? STeVe -- I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a tiny blip on the distant coast of sanity. --- Bucky Katt, Get Fuzzy

Steven Bethard wrote:
One of the great things about IronPython is that you don't *need* any wrappers - you access .NET objects natively (which in fact wrap the lower level win32 API) - and the .NET APIs are usually not as bad as you probably assume. ;-) You just have to be aware that line endings are '\r\n'. I'm not sure how or if pywin32 handles this. Michael
STeVe

On 9/29/07, Michael Foord <fuzzyman@voidspace.org.uk> wrote:
Ahh, I see. So all the .NET components function like Python 3.0's io.open(..., newline='\n'), where no translation of \n (to or from \r\n) is performed. STeVe -- I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a tiny blip on the distant coast of sanity. --- Bucky Katt, Get Fuzzy

Given the current lengthy discussion about newline translation, maybe it isn't such a great thing :-) Seriously, you do need a wrapper in this particular case - to convert the .NET line ending convention to Python's. The issue here is that such a wrapper is so trivial, that it's usually easier to simply do the translation with adhoc .replace('\r\n', '\n') calls. The problem comes when you accidentally forget a translation - then you get the clash between the .NET (\r\\n) and Python (\n) models. But of course, the solution in that case is to simply add the omitted translation, not to change Python's IO model. Of course, all this grand theory is just that - theory. In my case, it helped me understand what's going on, but that's all. For real life code, you just add the appropriate replace() calls. Whether theory helps you keep track of where replace() is needed, or whether you just know, doesn't really matter much. But regardless - the Python IO model doesn't need changing. (Not even 2.x, and the py3k model is even better in this regard). Paul.

Michael Foord wrote:
One of the great things about IronPython is that you don't *need* any wrappers - you access .NET objects natively
But it seems that you really *do* need wrappers to deal with the line endings problem, whether they're provided automatically or you it yourself manually. This is reminiscent of the C-string vs. Pascal-string fiasco when Apple switched from Pascal to C as their main application programming language. Some development environments provided glue code that did the translation automatically; others required you to do it yourself, which was a huge nuisance. -- Greg

Michael Foord wrote:
This thread might represent an argument that you *do* need wrappers ...
You just have to be aware that line endings are '\r\n'. I'm not sure how or if pywin32 handles this.
Presumably that awareness should be implemented by the "unnecessary" wrappers. regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 Holden Web LLC/Ltd http://www.holdenweb.com Skype: holdenweb http://del.icio.us/steve.holden Sorry, the dog ate my .sigline

Steve Holden wrote:
Well, it's an OS level difference and I thought that in general Python *doesn't* try to protect you from OS differences. These different line endings are returned by the components - and making the string type aware of where it comes from and transform itself accordingly seems odd. It also leaves you with all sorts of other problems like string comparison (do you ignore difference in line endings?), string length (on different sides of the .NET / IronPython strings would report different lengths). It is also different from how libraries like wxPython behave - where they *don't* protect you from OS differences and if a textbox has '\r\n' line endings - that is what you get... Michael http://www.manning.com/foord
regards Steve

Well, it's an OS level difference and I thought that in general Python *doesn't* try to protect you from OS differences.
I think that's the key point. In general, Python tries to present a "translucent" interface to the OS in which OS differences can show through, in contrast to other languages (Java?) which try to present a complete abstraction of the underlying environment. This makes Python in general more useful, thought it also makes it harder to write portable code in Python, because you have to be aware of the potential differences (and they aren't particularly well documented -- it's not clear that they can be). Bill

Michael Foord wrote:
That sounds like an undesirable deficiency of those library wrappers, especially cross-platform ones like wxPython. -- Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | Carpe post meridiem! | Christchurch, New Zealand | (I'm not a morning person.) | greg.ewing@canterbury.ac.nz +--------------------------------------+

"Michael Foord" <fuzzyman@voidspace.org.uk> wrote in message news:46FE9B09.8000800@voidspace.org.uk... | Terry Reedy wrote: | > There are two normal ways for internal Python text to have \r\n: | > 1. Read from a file with \r\r\n. Then \r\r\n is correct output (on the | > same platform). | > 2. Intentially put there by a programmer. If s/he also chooses default \n | > translation on output, \r<translation of \n> is correct. | > | Actually, I usually get these strings from Windows UI components. A file | containing '\r\n' is read in with '\r\n' being translated to '\n'. New | user input is added containing '\r\n' line endings. The file is written | out and now contains a mix of '\r\n' and '\r\r\n'. I covered this in the part you snipped: "2. Other special situations, which can be handled by disabling, overriding, and layering the defaults. This seems enough flexibility to me." While mixing input like this may seem 'normal' to you, I believe it is 'special' considering the total Python community. I can think of at least 4 decent solutions, depending on the details of the input and what you do with it. tjr

Michael> Actually, I usually get these strings from Windows UI Michael> components. A file containing '\r\n' is read in with '\r\n' Michael> being translated to '\n'. New user input is added containing Michael> '\r\n' line endings. The file is written out and now contains a Michael> mix of '\r\n' and '\r\r\n'. So you need a translation layer between the UI component and your code. Treat the component as a text file and perform the desired mapping. Yes? Skip

skip@pobox.com wrote:
Actually the problem was reported by one of the IronPython developers on behalf of another user. We stick to using the .NET file I/O and so don't have a problem. The only time it is an issue for us is our tests, where we have string literals in our test code (where new lines are obviously '\n') and we do a manual 'replace'. Not very difficult. It is just slightly ironic that the time Python 'gets it wrong' (for some value of wrong) is when you are using text mode for I/O :-) Michael
Skip

Michael Foord wrote:
If you're going to do that, you really need to be consistent about and have IronPython use \r\n internally for line endings *everywhere*, including string literals.
It is just slightly ironic that the time Python 'gets it wrong' (for some value of wrong) is when you are using text mode for I/O :-)
I would say IronPython is getting it wrong by using inconsistent internal representations of line endings. -- Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | Carpe post meridiem! | Christchurch, New Zealand | (I'm not a morning person.) | greg.ewing@canterbury.ac.nz +--------------------------------------+

On 9/30/07, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
I don't know what you mean by "internally". There's lots of portable code that uses the \n character in string literals (either to generate line endings or to recognize them). That code can't suddenly be made invalid. And changing all string literals that say "\n" to secretly become "\r\n" would be worse than the \r <--> \n swap that some old Apple tools used to do. (If len("\n") == 2, what would len("\r\n") be?)
Honestly, I find it hard to see much merit in this discussion. A number of Python libraries, including print() and io.py, use \n to represent line endings in memory, and translate these to/from platform-appropriate line endings when reading/writing text files. OTOH, some other APIs, for example, sockets talking various internet protocols (from SMTP to HTTP) as well as most (all?) native .NET APIs, use \r\n to represent line endings. There are any number of ways to convert between these conversions, including various invocations of s.replace() and s.splitlines() (the latter does a universal-newlines-like thing). Applications can take care of this, and APIs can choose to use either convention for line endings (or both, in the case of input). Yes, occasionally users get confused. Too bad. They'll have to learn about this issue. The issue isn't going away by wishing it to go away; it is a fundamental difference between Windows and Unix, and neither is likely to change or disappear. Changing Python to use the Windows convention internally isn't going to help one bit. Changing Python to use the platforn's convention is impossible without introducing a new string escape that would mean \r\n on Windows and \n on Unix; and given that there are legitimate reasons to sometimes deal with \r\n explicitly even on Unix (and with just \n even on Windows) we wouldn't be completely isolated from the issue. Changing APIs to not represent the line ending as a character (as the Java I/O libraries do) would be too big a change (and how would we distinguish between readline() returning an empty line and EOF?) -- and I'm sure the issue still pops up in plenty of places in Java. The best solution for IronPython is probably to have the occasional wrapper around .NET APIs that translates between \r\n and \n on the boundary between Python and .NET; but one must be able to turn this off or bypass the wrappers in cases where the data retrieved from one .NET API is just passed straight on to another .NET API (and the translation would just cause two redundant copies being made). Get used to it. End of discussion. -- --Guido van Rossum (home page: http://www.python.org/~guido/)

Guido van Rossum wrote:
That's probably true. I was responding to the notion that IronPython shouldn't need any wrappers. To make that really true would require IronPython to become a different language that has a different canonical representation of newlines. It's fine with me to keep things as they are. -- Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | Carpe post meridiem! | Christchurch, New Zealand | (I'm not a morning person.) | greg.ewing@canterbury.ac.nz +--------------------------------------+

On 9/29/07, Nick Maclaren <nmm1@cus.cam.ac.uk> wrote:
I don't see how this is different from Unix/C "\n" being an atomic newline character. If you're saying that BCPL is better because it defines standard semantics for more control characters than just "\n", that may be true, but C is doing about the best it can with "\n" as far as I can see, given all the crazy things that different OSes want to do with line endings. In any case, the problem which started all this isn't really an I/O problem at all, it's a mismatch between the world of Python strings which use "\n" and .NET library code expecting strings which use "\r\n". The correct thing to do with that is to translate whenever a string crosses a boundary between Python code and .NET code. This is something that ought to be done automatically by the Python/.NET interfacing machinery, maybe by having a different type for .NET strings. -- Greg

On 9/29/07, Nick Maclaren <nmm1@cus.cam.ac.uk> wrote:
Have you looked at Py3k at all, especially PEP 3116 (new I/O)? Python *does* have its own I/O model. There are binary files and text files. For binary files, you write bytes and the semantic model is that of an array of bytes; byte indices are seek positions. For text files, the contents is considered to be Unicode, encoded as bytes in a binary file. So text file always has an underlying binary file. Two translations take place, both of which have defaults varying by platform. One translation is encoding Unicode text into bytes upon output, and decoding bytes to Unicode text upon input. This can use any encoding supported by the encodings package. The other translation deals with line endings. Upon input, any of \r\n, \r, or \n is translated to a single \n by default (this is nhe "universal newlines" algorithm from Python 2.x). This can be tweaked or disabled. Upon output, \n is translated into a platform specific string chosen from \r\n, \r, or \n. This can also be disabled or overridden. Note that \r, when written, is never treated specially; if you want special processing for \r on output, you can write your own translation layer. That's all. There is nothing unimplementable or confusing in these specifications. Python doesn't care about record I/O on legacy OSes; it does care about variability found in practice between popular OSes. Note that \r, \n and friends in Python 3000 are either ASCII (in bytes literals) or Unicode (in text literals). Again, no support for legacy systems that don't use ASCII or a superset. Legacy OSes are called that for a reason. -- --Guido van Rossum (home page: http://www.python.org/~guido/)

Guido van Rossum wrote:
So the question is, that when a string containing '\r\n' is written to a file in text mode on a Windows platform, should it be written with the encoded representation of '\r\n' or '\r\r\n'? Purity would dictate the latter and practicality the former (IMO)... However, that would mean that round tripping a string would change it ('\r\n' would be written as '\r\n' and then read as '\n') - on the other hand (particularly given that we are treating the data as text and not a binary blob) I don't see how writing '\r\r\n' would ever actually be useful in text. +1 on just writing '\r\n' from me. Michael Foord http://www.manning.com/foord

"Michael Foord" <fuzzyman@voidspace.org.uk> wrote in message news:46FE6F92.40601@voidspace.org.uk... | Guido van Rossum wrote: [snip first part of nice summary of Python i/o model] | > The other translation deals with line endings. Upon input, any of | > \r\n, \r, or \n is translated to a single \n by default (this is nhe [sic] | > "universal newlines" algorithm from Python 2.x). This can be tweaked | > or disabled. Upon output, \n is translated into a platform specific | > string chosen from \r\n, \r, or \n. This can also be disabled or | > overridden. Note that \r, when written, is never treated specially; if | > you want special processing for \r on output, you can write your own | > translation layer. | So the question is, that when a string containing '\r\n' is written to a | file in text mode on a Windows platform, should it be written with the | encoded representation of '\r\n' or '\r\r\n'? I think Guido pretty clearly said that on output, the default behavior is that \r is nothing special. If you want a special case exception, write a special case translator. +1 from me. To propose otherwise is to propose that the default semantic meaning of Python text objects depend on the platform that it might be output-translated for. I believe the point of universal newline support was to get away from this. | Purity would dictate the latter and practicality the former (IMO)... I disagree. Special case exceptions complicate both learnability and code readability and maintainability. Simplicity is practicality. The symmetry of 'platform-line-endings =input> \n =output> plaform-line-endings' is both pure and practical. | However, that would mean that round tripping a string would change it | ('\r\n' would be written as '\r\n' and then read as '\n') Whereas \r\r\n would be read back as \r\n, which is what should happen. Round-trip-ability is practical to me. | - on the other | hand (particularly given that we are treating the data as text and not a | binary blob) I don't see how writing '\r\r\n' would ever actually be | useful in text. There are two normal ways for internal Python text to have \r\n: 1. Read from a file with \r\r\n. Then \r\r\n is correct output (on the same platform). 2. Intentially put there by a programmer. If s/he also chooses default \n translation on output, \r<translation of \n> is correct. The leaves 1. Bugs due to ignorance or accident. These should be repaired. 2. Other special situations, which can be handled by disabling, overriding, and layering the defaults. This seems enough flexibility to me. Terry Jan Reedy

Terry Reedy wrote:
Actually, I usually get these strings from Windows UI components. A file containing '\r\n' is read in with '\r\n' being translated to '\n'. New user input is added containing '\r\n' line endings. The file is written out and now contains a mix of '\r\n' and '\r\r\n'. Michael

On 9/29/07, Michael Foord <fuzzyman@voidspace.org.uk> wrote:
Out of curiosity, why don't the Python wrappers for your Windows UI components do the appropriate '\r\n' -> '\n' conversions? STeVe -- I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a tiny blip on the distant coast of sanity. --- Bucky Katt, Get Fuzzy

Steven Bethard wrote:
One of the great things about IronPython is that you don't *need* any wrappers - you access .NET objects natively (which in fact wrap the lower level win32 API) - and the .NET APIs are usually not as bad as you probably assume. ;-) You just have to be aware that line endings are '\r\n'. I'm not sure how or if pywin32 handles this. Michael
STeVe

On 9/29/07, Michael Foord <fuzzyman@voidspace.org.uk> wrote:
Ahh, I see. So all the .NET components function like Python 3.0's io.open(..., newline='\n'), where no translation of \n (to or from \r\n) is performed. STeVe -- I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a tiny blip on the distant coast of sanity. --- Bucky Katt, Get Fuzzy

Given the current lengthy discussion about newline translation, maybe it isn't such a great thing :-) Seriously, you do need a wrapper in this particular case - to convert the .NET line ending convention to Python's. The issue here is that such a wrapper is so trivial, that it's usually easier to simply do the translation with adhoc .replace('\r\n', '\n') calls. The problem comes when you accidentally forget a translation - then you get the clash between the .NET (\r\\n) and Python (\n) models. But of course, the solution in that case is to simply add the omitted translation, not to change Python's IO model. Of course, all this grand theory is just that - theory. In my case, it helped me understand what's going on, but that's all. For real life code, you just add the appropriate replace() calls. Whether theory helps you keep track of where replace() is needed, or whether you just know, doesn't really matter much. But regardless - the Python IO model doesn't need changing. (Not even 2.x, and the py3k model is even better in this regard). Paul.

Michael Foord wrote:
One of the great things about IronPython is that you don't *need* any wrappers - you access .NET objects natively
But it seems that you really *do* need wrappers to deal with the line endings problem, whether they're provided automatically or you it yourself manually. This is reminiscent of the C-string vs. Pascal-string fiasco when Apple switched from Pascal to C as their main application programming language. Some development environments provided glue code that did the translation automatically; others required you to do it yourself, which was a huge nuisance. -- Greg

Michael Foord wrote:
This thread might represent an argument that you *do* need wrappers ...
You just have to be aware that line endings are '\r\n'. I'm not sure how or if pywin32 handles this.
Presumably that awareness should be implemented by the "unnecessary" wrappers. regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 Holden Web LLC/Ltd http://www.holdenweb.com Skype: holdenweb http://del.icio.us/steve.holden Sorry, the dog ate my .sigline

Steve Holden wrote:
Well, it's an OS level difference and I thought that in general Python *doesn't* try to protect you from OS differences. These different line endings are returned by the components - and making the string type aware of where it comes from and transform itself accordingly seems odd. It also leaves you with all sorts of other problems like string comparison (do you ignore difference in line endings?), string length (on different sides of the .NET / IronPython strings would report different lengths). It is also different from how libraries like wxPython behave - where they *don't* protect you from OS differences and if a textbox has '\r\n' line endings - that is what you get... Michael http://www.manning.com/foord
regards Steve

Well, it's an OS level difference and I thought that in general Python *doesn't* try to protect you from OS differences.
I think that's the key point. In general, Python tries to present a "translucent" interface to the OS in which OS differences can show through, in contrast to other languages (Java?) which try to present a complete abstraction of the underlying environment. This makes Python in general more useful, thought it also makes it harder to write portable code in Python, because you have to be aware of the potential differences (and they aren't particularly well documented -- it's not clear that they can be). Bill

Michael Foord wrote:
That sounds like an undesirable deficiency of those library wrappers, especially cross-platform ones like wxPython. -- Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | Carpe post meridiem! | Christchurch, New Zealand | (I'm not a morning person.) | greg.ewing@canterbury.ac.nz +--------------------------------------+

"Michael Foord" <fuzzyman@voidspace.org.uk> wrote in message news:46FE9B09.8000800@voidspace.org.uk... | Terry Reedy wrote: | > There are two normal ways for internal Python text to have \r\n: | > 1. Read from a file with \r\r\n. Then \r\r\n is correct output (on the | > same platform). | > 2. Intentially put there by a programmer. If s/he also chooses default \n | > translation on output, \r<translation of \n> is correct. | > | Actually, I usually get these strings from Windows UI components. A file | containing '\r\n' is read in with '\r\n' being translated to '\n'. New | user input is added containing '\r\n' line endings. The file is written | out and now contains a mix of '\r\n' and '\r\r\n'. I covered this in the part you snipped: "2. Other special situations, which can be handled by disabling, overriding, and layering the defaults. This seems enough flexibility to me." While mixing input like this may seem 'normal' to you, I believe it is 'special' considering the total Python community. I can think of at least 4 decent solutions, depending on the details of the input and what you do with it. tjr

Michael> Actually, I usually get these strings from Windows UI Michael> components. A file containing '\r\n' is read in with '\r\n' Michael> being translated to '\n'. New user input is added containing Michael> '\r\n' line endings. The file is written out and now contains a Michael> mix of '\r\n' and '\r\r\n'. So you need a translation layer between the UI component and your code. Treat the component as a text file and perform the desired mapping. Yes? Skip

skip@pobox.com wrote:
Actually the problem was reported by one of the IronPython developers on behalf of another user. We stick to using the .NET file I/O and so don't have a problem. The only time it is an issue for us is our tests, where we have string literals in our test code (where new lines are obviously '\n') and we do a manual 'replace'. Not very difficult. It is just slightly ironic that the time Python 'gets it wrong' (for some value of wrong) is when you are using text mode for I/O :-) Michael
Skip

Michael Foord wrote:
If you're going to do that, you really need to be consistent about and have IronPython use \r\n internally for line endings *everywhere*, including string literals.
It is just slightly ironic that the time Python 'gets it wrong' (for some value of wrong) is when you are using text mode for I/O :-)
I would say IronPython is getting it wrong by using inconsistent internal representations of line endings. -- Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | Carpe post meridiem! | Christchurch, New Zealand | (I'm not a morning person.) | greg.ewing@canterbury.ac.nz +--------------------------------------+

On 9/30/07, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
I don't know what you mean by "internally". There's lots of portable code that uses the \n character in string literals (either to generate line endings or to recognize them). That code can't suddenly be made invalid. And changing all string literals that say "\n" to secretly become "\r\n" would be worse than the \r <--> \n swap that some old Apple tools used to do. (If len("\n") == 2, what would len("\r\n") be?)
Honestly, I find it hard to see much merit in this discussion. A number of Python libraries, including print() and io.py, use \n to represent line endings in memory, and translate these to/from platform-appropriate line endings when reading/writing text files. OTOH, some other APIs, for example, sockets talking various internet protocols (from SMTP to HTTP) as well as most (all?) native .NET APIs, use \r\n to represent line endings. There are any number of ways to convert between these conversions, including various invocations of s.replace() and s.splitlines() (the latter does a universal-newlines-like thing). Applications can take care of this, and APIs can choose to use either convention for line endings (or both, in the case of input). Yes, occasionally users get confused. Too bad. They'll have to learn about this issue. The issue isn't going away by wishing it to go away; it is a fundamental difference between Windows and Unix, and neither is likely to change or disappear. Changing Python to use the Windows convention internally isn't going to help one bit. Changing Python to use the platforn's convention is impossible without introducing a new string escape that would mean \r\n on Windows and \n on Unix; and given that there are legitimate reasons to sometimes deal with \r\n explicitly even on Unix (and with just \n even on Windows) we wouldn't be completely isolated from the issue. Changing APIs to not represent the line ending as a character (as the Java I/O libraries do) would be too big a change (and how would we distinguish between readline() returning an empty line and EOF?) -- and I'm sure the issue still pops up in plenty of places in Java. The best solution for IronPython is probably to have the occasional wrapper around .NET APIs that translates between \r\n and \n on the boundary between Python and .NET; but one must be able to turn this off or bypass the wrappers in cases where the data retrieved from one .NET API is just passed straight on to another .NET API (and the translation would just cause two redundant copies being made). Get used to it. End of discussion. -- --Guido van Rossum (home page: http://www.python.org/~guido/)

Guido van Rossum wrote:
That's probably true. I was responding to the notion that IronPython shouldn't need any wrappers. To make that really true would require IronPython to become a different language that has a different canonical representation of newlines. It's fine with me to keep things as they are. -- Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | Carpe post meridiem! | Christchurch, New Zealand | (I'm not a morning person.) | greg.ewing@canterbury.ac.nz +--------------------------------------+

On 9/29/07, Nick Maclaren <nmm1@cus.cam.ac.uk> wrote:
I don't see how this is different from Unix/C "\n" being an atomic newline character. If you're saying that BCPL is better because it defines standard semantics for more control characters than just "\n", that may be true, but C is doing about the best it can with "\n" as far as I can see, given all the crazy things that different OSes want to do with line endings. In any case, the problem which started all this isn't really an I/O problem at all, it's a mismatch between the world of Python strings which use "\n" and .NET library code expecting strings which use "\r\n". The correct thing to do with that is to translate whenever a string crosses a boundary between Python code and .NET code. This is something that ought to be done automatically by the Python/.NET interfacing machinery, maybe by having a different type for .NET strings. -- Greg
participants (10)
-
Bill Janssen
-
Greg Ewing
-
Guido van Rossum
-
Michael Foord
-
Nick Maclaren
-
Paul Moore
-
skip@pobox.com
-
Steve Holden
-
Steven Bethard
-
Terry Reedy