[Python-checkins] CVS: python/nondist/peps pep-0278.txt,NONE,1.1 pep-0000.txt,1.151,1.152

Barry Warsaw bwarsaw@users.sourceforge.net
Wed, 23 Jan 2002 05:24:28 -0800


Update of /cvsroot/python/python/nondist/peps
In directory usw-pr-cvs1:/tmp/cvs-serv12056

Modified Files:
	pep-0000.txt 
Added Files:
	pep-0278.txt 
Log Message:
Added PEP 278, Universal Newline Support, Jack Jansen


--- NEW FILE: pep-0278.txt ---
PEP: 278
Title: Universal Newline Support
Version: $Revision: 1.1 $
Last-Modified: $Date: 2002/01/23 13:24:26 $
Author: jack@cwi.nl (Jack Jansen)
Status: Draft
Type: Standards Track
Created: 14-Jan-2002
Python-Version: 2.3
Post-History:


Abstract

    This PEP discusses a way in which Python can support I/O on files
    which have a newline format that is not the native format on the
    platform, so that Python on each platform can read and import
    files with CR (Macintosh), LF (Unix) or CR LF (Windows) line
    endings.

    It is more and more common to come across files that have an end
    of line that does not match the standard on the current platform:
    files downloaded over the net, remotely mounted filesystems on a
    different platform, Mac OS X with its double standard of Mac and
    Unix line endings, etc.
    
    Many tools such as editors and compilers already handle this
    gracefully, it would be good if Python did so too.


Specification

    Universal newline support needs to be enabled during the configure
    of Python.
    
    In a Python with universal newline support the feature is
    automatically enabled for all import statements and source()
    calls.
    
    In a Python with universal newline support open() the mode
    parameter can also be "t", meaning "open for input as a text file
    with universal newline interpretation".  Mode "t" cannot be
    combined with other mode flags such as "+".
    
    There is no special support for output to file with a different
    newline convention.
    
    A file object that has been opened in universal newline mode gets
    a new attribute "newlines" which reflects the newline convention
    used in the file.  The value for this attribute is one of None (no
    newline read yet), "\r", "\n", "\r\n" or "mixed" (multiple
    different types of newlines seen).

    
Rationale

    Universal newline support is implemented in C, not in Python.
    This is done because we want files with a foreign newline
    convention to be import-able, so a Python Lib directory can be
    shared over a remote file system connection, or between MacPython
    and Unix-Python on Mac OS X.  For this to be feasible the
    universal newline convention needs to have a reasonably small
    impact on performance, which means a Python implementation is not
    an option as it would bog down all imports. And because of files
    with multiple newline conventions, which Visual C++ and other
    Windows tools will happily produce, doing a quick check for the
    newlines used in a file (handing off the import to C code if a
    platform-local newline is seen) will not work.  Finally, a C
    implementation also allows tracebacks and such (which open the
    Python source module) to be handled easily.
    
    Universal newline support is implemented (for this release) as a
    compile time option because there is a performance penalty, even
    though it should be a small one.
    
    There is no output implementation of universal newlines, Python
    programs are expected to handle this by themselves or write files
    with platform-local convention otherwise.  The reason for this is
    that input is the difficult case, outputting different newlines to
    a file is already easy enough in Python.
    
    While universal newlines are automatically enabled for import they
    are not for opening, where you have to specifically say open(...,
    "t"). This is open to debate, but here are a few reasons for this
    design:

    - Compatibility.  Programs which already do their own
      interpretation of \r\n in text files would break.  Programs
      which open binary files as text files on Unix would also break
      (but it could be argued they deserve it :-).
      
    - Interface clarity.  Universal newlines are only supported for
      input files, not for input/output files, as the semantics would
      become muddy.  Would you write Mac newlines if all reads so far
      had encountered Mac newlines?  But what if you then later read a
      Unix newline?
    
    The newlines attribute is included so that programs that really
    care about the newline convention, such as text editors, can
    examine what was in a file.  They can then save (a copy of) the
    file with the same newline convention (or, in case of a file with
    mixed newlines, ask the user what to do, or output in platform
    convention).
    
    Feedback is explicitly solicited on one item in the reference
    implementation: whether or not the universal newlines routines
    should grab the global interpreter lock.  Currently they do not,
    but this could be considered living dangerously, as they may
    modify fields in a FileObject.  But as these routines are
    replacements for fgets() and fread() as well it may be difficult
    to decide whether or not the lock is held when the routine is
    called.  Moreover, the only danger is that if two threads read the
    same FileObject at the same time an extraneous newline may be seen
    or the "newlines" attribute may inadvertently be set to mixed.  I
    would argue that if you read the same FileObject in two threads
    simultaneously you are asking for trouble anyway.

    
Reference Implementation

    A reference implementation is available in SourceForge patch #476814.


References

    None.


Copyright

    This document has been placed in the public domain.



Local Variables:
mode: indented-text
indent-tabs-mode: nil
fill-column: 70
End:

Index: pep-0000.txt
===================================================================
RCS file: /cvsroot/python/python/nondist/peps/pep-0000.txt,v
retrieving revision 1.151
retrieving revision 1.152
diff -C2 -d -r1.151 -r1.152
*** pep-0000.txt	2002/01/13 00:13:38	1.151
--- pep-0000.txt	2002/01/23 13:24:26	1.152
***************
*** 89,92 ****
--- 89,93 ----
   S   276  Simple Iterator for ints                     Althoff
   S   277  Unicode file name support for Windows NT     Hodgson
+  S   278  Universal Newline Support                    Jansen
  
   Finished PEPs (done, implemented in CVS)
***************
*** 238,241 ****
--- 239,243 ----
   S   276  Simple Iterator for ints                     Althoff
   S   277  Unicode file name support for Windows NT     Hodgson
+  S   278  Universal Newline Support                    Jansen
   SR  666  Reject Foolish Indentation                   Creighton
  
***************
*** 271,274 ****
--- 273,277 ----
      Hudson, Michael          mwh@python.net
      Hylton, Jeremy           jeremy@zope.com
+     Jansen, Jack             jack@cwi.nl
      Kuchling, Andrew         akuchlin@mems-exchange.org
      Lemburg, Marc-Andre      mal@lemburg.com