[Patches] fileinput.py argument handling (and suggestion)

Greg Ward gward@mems-exchange.org
Mon, 10 Apr 2000 15:16:27 -0400


On 10 April 2000, Greg Stein said:
> +1 on adding the new class to fileinput.py. I've implemented similar code
> several times... and would like to avoid that :-)

Well, that's one.  Is anyone else interested in seeing a super-beefed up
(possibly over-engineered) "read a `text' file with some subset of the
usual Unix conventions" class added to fileinput.py?

Here's the docstring for my nominee (text_file.py, which you presently
all have in your Python source tree squirreled away in Lib/distutils):

"""Provides a file-like object that takes care of all the things you
   commonly want to do when processing a text file that has some
   line-by-line syntax: strip comments (as long as "#" is your comment
   character), skip blank lines, join adjacent lines by escaping the
   newline (ie. backslash at end of line), strip leading and/or
   trailing whitespace, and collapse internal whitespace.  All of these
   are optional and independently controllable.

   Provides a 'warn()' method so you can generate warning messages that
   report physical line number, even if the logical line in question
   spans multiple physical lines.  Also provides 'unreadline()' for
   implementing line-at-a-time lookahead.

   Constructor is called as:

       TextFile (filename=None, file=None, **options)

   It bombs (RuntimeError) if both 'filename' and 'file' are None;
   'filename' should be a string, and 'file' a file object (or
   something that provides 'readline()' and 'close()' methods).  It is
   recommended that you supply at least 'filename', so that TextFile
   can include it in warning messages.  If 'file' is not supplied,
   TextFile creates its own using the 'open()' builtin.

   The options are all boolean, and affect the value returned by
   'readline()':
     strip_comments [default: true]
       strip from "#" to end-of-line, as well as any whitespace
       leading up to the "#" -- unless it is escaped by a backslash
     lstrip_ws [default: false]
       strip leading whitespace from each line before returning it
     rstrip_ws [default: true]
       strip trailing whitespace (including line terminator!) from
       each line before returning it
     skip_blanks [default: true}
       skip lines that are empty *after* stripping comments and
       whitespace.  (If both lstrip_ws and rstrip_ws are true,
       then some lines may consist of solely whitespace: these will
       *not* be skipped, even if 'skip_blanks' is true.)
     join_lines [default: false]
       if a backslash is the last non-newline character on a line
       after stripping comments and whitespace, join the following line
       to it to form one "logical line"; if N consecutive lines end
       with a backslash, then N+1 physical lines will be joined to
       form one logical line.
     collapse_ws [default: false]  
       after stripping comments and whitespace and joining physical
       lines into logical lines, all internal whitespace (strings of
       whitespace surrounded by non-whitespace characters, and not at
       the beginning or end of the logical line) will be collapsed
       to a single space.

   Note that since 'rstrip_ws' can strip the trailing newline, the
   semantics of 'readline()' must differ from those of the builtin file
   object's 'readline()' method!  In particular, 'readline()' returns
   None for end-of-file: an empty string might just be a blank line (or
   an all-whitespace line), if 'rstrip_ws' is true but 'skip_blanks' is
   not."""

Is this too much?  Not enough?  Silly to put it in fileinput.py when
I'll already have to include it with the Distutils for pre-1.6 Pythons?

        Greg
-- 
Greg Ward - software developer                gward@mems-exchange.org
MEMS Exchange / CNRI                           voice: +1-703-262-5376
Reston, Virginia, USA                            fax: +1-703-262-5367