[Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib ConfigParser.py,1.16,1.17

Greg Ward gward@cnri.reston.va.us
Tue, 7 Mar 2000 09:04:30 -0500


On 05 March 2000, Guido van Rossum said:
> - Variants on the syntax could be given through some kind of option
> system rather than through subclassing -- they should be combinable
> independently.  Som possible options (maybe I'm going overboard here)
> could be:
> 
> 	- comment characters: ('#', ';', both, others?)
> 	- comments after variables allowed? on sections?
> 	- variable characters: (':', '=', both, others?)
> 	- quoting of values with "..." allowed?
> 	- backslashes in "..." allowed?
> 	- does backslash-newline mean a continuation?
> 	- case sensitivity for section names (default on)
> 	- case sensitivity for option names (default off)
> 	- variables allowed before first section name?
> 	- first section name?  (default "main")
> 	- character set allowed in section names
> 	- character set allowed in variable names
> 	- %(...) substitution?

I agree with Fred that this level of flexibility is probably overkill
for a config file parser; you don't want every application author who
uses the module to have to explain his particular variant of the syntax.

However, if you're interested in a class that *does* provide some of the
above flexibility, I have written such a beast.  It's currently used to
parse the Distutils MANIFEST.in file, and I've considered using it for
the mythical Distutils config files.  (And it also gets heavy use in my
day job.)  It's really a class for reading a file in preparation for
"text processing the Unix way", though: it doesn't say anything about
syntax, it just worries about blank lines, comments, continuations, and
a few other things.  Here's the class docstring:

class TextFile:

    """Provides a file-like object that takes care of all the things you
       commonly want to do when processing a text file that has some
       line-by-line syntax: strip comments (as long as "#" is your comment
       character), skip blank lines, join adjacent lines by escaping the
       newline (ie. backslash at end of line), strip leading and/or
       trailing whitespace, and collapse internal whitespace.  All of these
       are optional and independently controllable.

       Provides a 'warn()' method so you can generate warning messages that
       report physical line number, even if the logical line in question
       spans multiple physical lines.  Also provides 'unreadline()' for
       implementing line-at-a-time lookahead.

       Constructor is called as:

           TextFile (filename=None, file=None, **options)

       It bombs (RuntimeError) if both 'filename' and 'file' are None;
       'filename' should be a string, and 'file' a file object (or
       something that provides 'readline()' and 'close()' methods).  It is
       recommended that you supply at least 'filename', so that TextFile
       can include it in warning messages.  If 'file' is not supplied,
       TextFile creates its own using the 'open()' builtin.

       The options are all boolean, and affect the value returned by
       'readline()':
         strip_comments [default: true]
           strip from "#" to end-of-line, as well as any whitespace
           leading up to the "#" -- unless it is escaped by a backslash
         lstrip_ws [default: false]
           strip leading whitespace from each line before returning it
         rstrip_ws [default: true]
           strip trailing whitespace (including line terminator!) from
           each line before returning it
         skip_blanks [default: true}
           skip lines that are empty *after* stripping comments and
           whitespace.  (If both lstrip_ws and rstrip_ws are true,
           then some lines may consist of solely whitespace: these will
           *not* be skipped, even if 'skip_blanks' is true.)
         join_lines [default: false]
           if a backslash is the last non-newline character on a line
           after stripping comments and whitespace, join the following line
           to it to form one "logical line"; if N consecutive lines end
           with a backslash, then N+1 physical lines will be joined to
           form one logical line.
         collapse_ws [default: false]  
           after stripping comments and whitespace and joining physical
           lines into logical lines, all internal whitespace (strings of
           whitespace surrounded by non-whitespace characters, and not at
           the beginning or end of the logical line) will be collapsed
           to a single space.

       Note that since 'rstrip_ws' can strip the trailing newline, the
       semantics of 'readline()' must differ from those of the builtin file
       object's 'readline()' method!  In particular, 'readline()' returns
       None for end-of-file: an empty string might just be a blank line (or
       an all-whitespace line), if 'rstrip_ws' is true but 'skip_blanks' is
       not."""

Interested in having something like this in the core?  Adding more
options is possible, but the code is already on the hairy side to
support all of these.  And I'm not a big fan of the subtle difference in
semantics with file objects, but honestly couldn't think of a better way
at the time.

If you're interested, you can download it from

    http://www.mems-exchange.org/exchange/software/python/text_file/

or just use the version in the Distutils CVS tree.

        Greg