In an unprecedented move MacPython 2.2a3 is available only a scant 26 hours or so after the unix/windows distribution of 2.2a3. MacPython 2.2a3 is available via http://www.cwi.nl/~jack/macpython.html, as usual, and in hqx or macbinary form, as a full installer, an active installer and a source distribution (as usual:-). Aside from the general new 2.2a3 features there are three specific changes in MacPython that are worth mentioning: - The structure of the MacOS toolbox modules has changed. All the modules have been put into a "Carbon" package (which, despite the name, runs fine in the classic PPC runtime model). There is a backwards compatibility folder on sys.path that will keep imports with the old names working (with an obnoxious warning). - Plugin modules are now in :Lib:lib-dynload in stead of in :Mac:PlugIns, to make the installed tree look more like the unix tree. - On input, unix line-endings are now acceptable for all text files. This is an experimental feature (awaiting a general solution, for which a PEP has been promised but not started yet, the giulty parties know who they are:-), and it can be turned off with a preference. The downside of the quick release is that the installer has only been tested on MacOSX 10.0.4 and MacOS 9.1. Please report problems on older releases of MacOS asap. -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.cwi.nl/~jack | ++++ see http://www.xs4all.nl/~tank/ ++++
Jack Jansen wrote:
- On input, unix line-endings are now acceptable for all text files. This is an experimental feature (awaiting a general solution, for which a PEP has been promised but not started yet, the giulty parties know who they are:-), and it can be turned off with a preference.
Jack, I don't know if I qualify as one of the "guilty" parties, but I did volunteer to help with a PEP about this, and I'd still like to. I do have some ideas about what I'd like to see in that PEP. The one thing I have done is write a prototype in pure Python for how I would like platform neutral text files to work. I've enclosed it with this message, and invite comments. Has anyone started this PEP yet? if so, I'd like to help, if not, then the following is a very early draft of my thoughts. Note that I am writting this from memory, without going back to the archives to see what all the comments were at the time. I will do that before I call this a PEP. Here are my quick thoughts: This started (the recent thread, anyway) with the need for MacPython (with the introduction of OS-X) to be able to read both traditional mac style text files and unix style text files. An import-hook was suggested, but then it was brought up that a lot of python code can be read in other ways than an import, from execfile(), and a whole lot of others, so an imprt hook would not be enough. In general, the problem stems from the fact that while Python knows what system it is running on, a file that is being read may or may not be on that same system. This is most agregeuos with OS-X as you essentially have both Unix and MacOS running on the same machine at the same time, often sharing a file system. The issue also comes up with heterogeneous networks, where the file might reside on a server running on a different system than Python, and that file may be accessed by various systems. Some servers can do line feed translation on the fly, but this is not universal or foolproof. In addition to Python code, many Python programs need to read and write text files that are not in a native format, and the format may not be known by the programmer when the code is writen. My proposed solution to these problems is to have a new type of file: a "Universal" text file. This would be a text file that would do line-feed translation to the internal representation on the fly as the file was being read (like the current text file type), but it would translate any of the known text file formats automatically (\r\n, \r, \n Any others???). When the file was being written to, a single terminator would have to be specified, defaulting to the native one, or in the case of a file opened for appending, perhaps the one in the file when it is opened. The user could specify a non-native terminator when openign a file for writing. Issues: The two big issues that came up in the discussion were backward compatability and performance: 1) The python open() function currently defaults to a text file type. However, on Posix systems, there is no difference between a text file and a binary file, so many programmers writing code that is designed to run only on such systems left the "b" flag off when opening files for binary reading and writing. If the behaviour of a file opened without the binary flag were to change, a lot of code would break. 2) In recent versions of Python, a lot of effort was put into improving performance of line oriented text file reading. These optimisations require the use of native line endings. In order to get similar performance with non-native endings, some portions of the C stdio library would have to be re-written. This is a major undertaking, and no one has stepped up to volunteer. The proposed solution to both of these problems is to introduce a new flag to the open() function: "t". If the "t" flag is present, the function returns a Universal Text File, rather than a standard text file. As this is a new flag, no old code should be broken. The default would return a standard text file with the current behaviour. This would allow the implimentation to be written in a way that was robust, but perhaps not have optimum performance. If performance were critical, a programmer could always use the old style text file. If, at some point, code is written that allows the performance of Universal Text Files to approach that of standard text files, perhaps the two could be merged. It is unfortunate that the default would be the performance-optimised but less generally useful case, but that is a reasonable price to be paid for backward compatability. Perhaps the default could be changed at some point in the future when other incompatabilities are introduced (Python 3?) In the case of Python code being read, performance of the file read is unlikely to be critical to the performance of the application as a whole. Issues / questions: Some systems, (VMS ?) store text files in the file system as a series of lines, rather than just a string of bytes like most common systems today. It would take a little more code to accomidate this, but it could be done. Should a file being read be required to have a single line termination type, or could they be mixed and matched? The prototype code allows mix and match, but I'm not married to that idea. If it requires a single terminator, then some performance could be gained by checking the terminator type when opening the file, and using the existing native text file code when it is a native file. Others Issues??? I'd love to hear all your feedback on this write-up, as well as my code. Please either CC me or the MacPython list, as I'm not subscribed to python-dev -Chris -- Christopher Barker, Ph.D. ChrisHBarker@home.net --- --- --- http://members.home.net/barkerlohmann ---@@ -----@@ -----@@ ------@@@ ------@@@ ------@@@ Oil Spill Modeling ------ @ ------ @ ------ @ Water Resources Engineering ------- --------- -------- Coastal and Fluvial Hydrodynamics -------------------------------------- ------------------------------------------------------------------------ #!/usr/bin/env python """ TextFile.py : a module that provides a UniversalTextFile class, and a replacement for the native python "open" command that provides an interface to that class. It would usually be used as: from TextFile import open then you can use the new open just like the old one (with some added flags and arguments) or import TextFile file = TextFile.open(filename,flags,[bufsize], [LineEndingType], [LineBufferSize]) please send bug reports, helpful hints, and/or feature requests to: Chris Barker ChrisHBarker@home.net Copyright/licence is the same as whatever version of python you are running. """ import os ## Re-map the open function _OrigOpen = open def open(filename,flags = "",bufsize = -1, LineEndingType = "", LineBufferSize = ""): """ A new open function, that returns a regular python file object for the old calls, and returns a new nifty universal text file when required. This works just like the regular open command, except that a new flag and a new parameter has been added. The new flag is "t" which indicates that the file to be opened is a universal text file. While the standard open() function defaults to a text file, on Posix systems, there is no difference between a text file and binary fiole so there is a lot of code out there that opens files as text, when a binary file is really required. This code currently works just fine on Posix systems, so it was neccessary to introduce a new flag, to maintian backward compatabilty. The old style, line ending dpeendent text file with also provide better performance. To Call: file = open(filename,flags = "",bufsize = -1, LineEndingType = ""): - filename is the name of the file to be opened - flags is a string of one letter flags, the same as the standard open command, plus a "t" for universal text file. - - "b" means binary file, this returns the standard binary file object - - "t" means universal text file - - "r" for read only - - "w" for write. If there is both "w" and "t" than the user can specify a line ending type to be used with the LineEndingType parameter. - - "a" means append to existing file - bufsize specifies the buffer size to be used by the system. Same as the regular open function - LineEndingType is used only for writing (and appending) files, to specify a non-native line ending to be written. - - The options are: "native", "DOS", "Posix", "Unix", "Mac", or the characters themselves( "\r\n", etc. ). "native" will result in using the standard file object, which uses whatever is native for the system that python is running on. - LineBufferSize is the size of the buffer used to read data in a readline() operation. The default is currently set to 200 characters. If you will be reading files with many lines over 200 characters long, you should set this number to the largest expected line length. NOTE: I'm sure the flag checking could be more robust. """ if "t" in flags: # this is a universal text file if ("w" in flags) and (not "w+" in flags) and LineEndingType == "native": return _OrigOpen(filename,flags.replace("t",""), bufsize) return UniversalTextFile(filename,flags,LineEndingType,LineBufferSize) else: # this is a regular old file return _OrigOpen(filename,flags,bufsize) class UniversalTextFile: """ A class that acts just like a python file object, but has a mode that allows the reading of arbitrary formated text files, i.e. with either Unix, DOS or Mac line endings. [\n , \r\n, or \r] To keep it truly universal, it checks for each of these line ending possibilities at every line, so it should work on a file with mixed endings as well. """ def __init__(self,filename,flags = "",LineEndingType = "native",LineBufferSize = ""): self._file = _OrigOpen(filename,flags.replace("t","")+"b") LineEndingType = LineEndingType.lower() if LineEndingType == "native": self.LineSep = os.linesep() elif LineEndingType == "dos": self.LineSep = "\r\n" elif LineEndingType == "posix" or LineEndingType == "unix" : self.LineSep = "\n" elif LineEndingType == "mac": self.LineSep = "\r" else: self.LineSep = LineEndingType ## some attributes self.closed = 0 self.mode = flags self.softspace = 0 if LineBufferSize: self._BufferSize = LineBufferSize else: self._BufferSize = 100 def readline(self): start_pos = self._file.tell() ##print "Current file posistion is:", start_pos line = "" TotalBytes = 0 Buffer = self._file.read(self._BufferSize) while Buffer: ##print "Buffer = ",repr(Buffer) newline_pos = Buffer.find("\n") return_pos = Buffer.find("\r") if return_pos == newline_pos-1 and return_pos >= 0: # we have a DOS line line = Buffer[:return_pos]+ "\n" TotalBytes = newline_pos+1 break elif ((return_pos < newline_pos) or newline_pos < 0 ) and return_pos >=0: # we have a Mac line line = Buffer[:return_pos]+ "\n" TotalBytes = return_pos+1 break elif newline_pos >= 0: # we have a Posix line line = Buffer[:newline_pos]+ "\n" TotalBytes = newline_pos+1 break else: # we need a larger buffer NewBuffer = self._file.read(self._BufferSize) if NewBuffer: Buffer = Buffer + NewBuffer else: # we are at the end of the file, without a line ending. self._file.seek(start_pos + len(Buffer)) return Buffer self._file.seek(start_pos + TotalBytes) return line def readlines(self,sizehint = None): """ readlines acts like the regular readlines, except that it understands any of the standard text file line endings ("\r\n", "\n", "\r"). If sizehint is used, it will read a a maximum of that many bytes. It will never round up, as the regular readline sometimes does. This means that if your buffer size is less than the length of the next line, you'll get an empty string, which could incorrectly be interpreted as the end of the file. """ if sizehint: Data = self._file.read(sizehint) else: Data = self._file.read() if len(Data) == sizehint: #print "The buffer is full" FullBuffer = 1 else: FullBuffer = 0 Data = Data.replace("\r\n","\n").replace("\r","\n") Lines = [line + "\n" for line in Data.split('\n')] ## If the last line is only a linefeed it is an extra line if Lines[-1] == "\n": del Lines[-1] ## if it isn't then the last line didn't have a linefeed, so we need to remove the one we put on. else: ## or it's the end of the buffer if FullBuffer: self._file.seek(-(len(Lines[-1])-1),1) # reset the file position del(Lines[-1]) else: Lines[-1] = Lines[-1][:-1] return Lines def readnumlines(self,NumLines = 1): """ readnumlines is an extension to the standard file object. It returns a list containing the number of lines that are requested. I have found this to be very useful, and allows me to avoid the many loops like: lines = [] for i in range(N): lines.append(file.readline()) Also, If I ever get around to writing this in C, it will provide a speed improvement. """ Lines = [] while len(Lines) < NumLines: Lines.append(self.readline()) return Lines def read(self,size = None): """ read acts like the regular read, except that it tranlates any of the standard text file line endings ("\r\n", "\n", "\r") into a "\n" If size is used, it will read a maximum of that many bytes, before translation. This means that if the line endings have more than one character, the size returned will be smaller. This could be fixed, but it didn't seem worth it. If you want that much control, use a binary file. """ if size: Data = self._file.read(size) else: Data = self._file.read() return Data.replace("\r\n","\n").replace("\r","\n") def write(self,string): """ write is just like the regular one, except that it uses the line separator specified when the file was opened for writing or appending. """ self._file.write(string.replace("\n",self.LineSep)) def writelines(self,list): for line in list: self.write(line) # The rest of the standard file methods mapped def close(self): self._file.close() self.closed = 1 def flush(self): self._file.flush() def fileno(self): return self._file.fileno() def seek(self,offset,whence = 0): self._file.seek(offset,whence) def tell(self): return self._file.tell()
participants (2)
-
Chris Barker
-
Jack Jansen