Should a file being read be required to have a single line termination type, or could they be mixed and matched? The prototype code allows mix and match, but I'm not married to that idea. If it requires a single terminator, then some performance could be gained by checking the terminator type when opening the file, and using the existing native text file code when it is a native file.
I'm not aware of any type of text file, that supports switching line deliminators inside of the same file.... Now that doesn't mean it couldn't exist, but logically that would be a strange file.... I think it's a even bet that a file would have the same deliminator throughout the file.
My proposed solution to these problems is to have a new type of file: a "Universal" text file. This would be a text file that would do line-feed translation to the internal representation on the fly as the file was being read (like the current text file type), but it would translate any of the known text file formats automatically (\r\n, \r, \n Any others???).
That would be a interesting idea... I'm not sure how much of a performance hit we'd see, but that would certainly solve a PC / MAC issue I'm having.... (Any chance we could see the code?) Regarding your points on changing the default text file type.... There are several different ways to solve this. * Don't have the Universal be the default text file type, instead offer it as a different class, or as a open('garbage.txt', "r", universal), or some other optional switch. - Advantages being that this allows the programmer to control of the Universal usage. * Just the opposite, the programmer explicitly tells Python not to support universal... * Have the programmer subclass the File Type? * Add a global directive? * Specifically import a "universal" which will depreciate the "standard" text file IO routines... * Actually I think this sounds like the easiest and fastest way to deal with it. This way, you could add a extension library to speed it up, or whatever... (This idea is very much along the lines (??) of the STRING / STROP import) - Benjamin
I'm not aware of any type of text file, that supports switching line deliminators inside of the same file....
Now that doesn't mean it couldn't exist, but logically that would be a strange file....
I think it's a even bet that a file would have the same deliminator throughout the file.
I have observed this on Windows, where the text editor in VC++ can read files with \n line endings, and doesn't change those when it writes the file back, but always adds \r\n to lines it adds. So if you edit a file containing only \n line endings, inserting a few lines, you have mixed line endings. Also, Java supports this, and the algorithm to support it is not difficult: to read a line, read until you see either \r or \n; if you see \r, peek one character ahead and if that's a \n, include it in the line. (Haven't had the time to read the whole proposal, but a Java style text file implementation has been in my wish list for a long time.) --Guido van Rossum (home page: http://www.python.org/~guido/)
Also, Java supports this, and the algorithm to support it is not difficult: to read a line, read until you see either \r or \n; if you see \r, peek one character ahead and if that's a \n, include it in the line.
What about the mac where \r *is* the line ending? Nathan
Also, Java supports this, and the algorithm to support it is not difficult: to read a line, read until you see either \r or \n; if you see \r, peek one character ahead and if that's a \n, include it in the line.
What about the mac where \r *is* the line ending?
Works fine, as long as you only *peek* (i.e. don't actually consume the character following \r if it is not \n, so it's available for the next read). Requires a little smart buffer handling, which is one reason why it's hard to do using regular stdio. Also, interactive input must be treated special (so that if the user types "foo" followed by \r, the peek doesn't force the user to type another line just so that we can peek at the character following the \r). --Guido van Rossum (home page: http://www.python.org/~guido/)
Nathan Heagy wrote:
Also, Java supports this, and the algorithm to support it is not difficult: to read a line, read until you see either \r or \n; if you see \r, peek one character ahead and if that's a \n, include it in the line.
What about the mac where \r *is* the line ending?
then it won't be a \n, so it won't be included. -Chris -- Christopher Barker, Ph.D. ChrisHBarker@home.net --- --- --- http://members.home.net/barkerlohmann ---@@ -----@@ -----@@ ------@@@ ------@@@ ------@@@ Oil Spill Modeling ------ @ ------ @ ------ @ Water Resources Engineering ------- --------- -------- Coastal and Fluvial Hydrodynamics -------------------------------------- ------------------------------------------------------------------------
Hi there! On Tuesday, September 18, 2001, at 12:42 , Chris Barker wrote:
Nathan Heagy wrote:
Also, Java supports this, and the algorithm to support it is not difficult: to read a line, read until you see either \r or \n; if you see \r, peek one character ahead and if that's a \n, include it in the line.
What about the mac where \r *is* the line ending?
then it won't be a \n, so it won't be included.
I think the point is what to do if a file on the Mac includes the combination "\r\n" where the line should end at the "\r" but the "\n" should be returned as the next character or part of the next line... Peter (who thinks text files suck :-) -- Peter H. Froehlich @ http://www.ics.uci.edu/~pfroehli/
I think the point is what to do if a file on the Mac includes the combination "\r\n" where the line should end at the "\r" but the "\n" should be returned as the next character or part of the next line...
Then it's not a universal text file. --Guido van Rossum (home page: http://www.python.org/~guido/)
"Schollnick, Benjamin" wrote:
Should a file being read be required to have a single line termination type, or could they be mixed and matched? The prototype code allows mix and match, but I'm not married to that idea. If it requires a single terminator, then some performance could be gained by checking the terminator type when opening the file, and using the existing native text file code when it is a native file.
I'm not aware of any type of text file, that supports switching line deliminators inside of the same file....
I agree that it wouldn't be generated on purpose, but I have seen files that got edited on different systems get mixed up...that doesn't mean we have to support it though.
That would be a interesting idea... I'm not sure how much of a performance hit we'd see, but that would certainly solve a PC / MAC issue I'm having.... (Any chance we could see the code?)
I enclosed a Python version of the code with the message. If you didn't get it, lwt me know and I'll send you one directly.
Regarding your points on changing the default text file type....
* Don't have the Universal be the default text file type, instead offer it as a different class, or as a open('garbage.txt', "r", universal), or some other optional switch.
Exactly. I proposed adding a "t" flag to open(). I guess I didn't write that as clearly as I might have liked.
* Just the opposite, the programmer explicitly tells Python not to support universal...
Not good, old code coule break
* Have the programmer subclass the File Type?
too much work
* Add a global directive?
maybe... but it wouldn't allow mix and match
* Specifically import a "universal" which will depreciate the "standard" text file IO routines...
That's an option too.. but would again not allow mix and match. Also. part of the point of thios is to have Python use it when it imports code, so it would have to be pretty built in.
* Actually I think this sounds like the easiest and fastest way to deal with it. This way, you could add a extension library to speed it up, or whatever...
true, and that's exactly what I do with my prototype, which I am using in a bunch of my code already. Maybe some day I'll get around to writing it in C, alhough I'd love someone else to do it, I am a pretty lame C programmer. -Chris -- Christopher Barker, Ph.D. ChrisHBarker@home.net --- --- --- http://members.home.net/barkerlohmann ---@@ -----@@ -----@@ ------@@@ ------@@@ ------@@@ Oil Spill Modeling ------ @ ------ @ ------ @ Water Resources Engineering ------- --------- -------- Coastal and Fluvial Hydrodynamics -------------------------------------- ------------------------------------------------------------------------
participants (5)
-
Chris Barker -
Guido van Rossum -
Nathan Heagy -
Peter H. Froehlich -
Schollnick, Benjamin