[Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib ConfigParser.py,1.16,1.17
Greg Ward
gward@cnri.reston.va.us
Tue, 7 Mar 2000 09:04:30 -0500
On 05 March 2000, Guido van Rossum said:
> - Variants on the syntax could be given through some kind of option
> system rather than through subclassing -- they should be combinable
> independently. Som possible options (maybe I'm going overboard here)
> could be:
>
> - comment characters: ('#', ';', both, others?)
> - comments after variables allowed? on sections?
> - variable characters: (':', '=', both, others?)
> - quoting of values with "..." allowed?
> - backslashes in "..." allowed?
> - does backslash-newline mean a continuation?
> - case sensitivity for section names (default on)
> - case sensitivity for option names (default off)
> - variables allowed before first section name?
> - first section name? (default "main")
> - character set allowed in section names
> - character set allowed in variable names
> - %(...) substitution?
I agree with Fred that this level of flexibility is probably overkill
for a config file parser; you don't want every application author who
uses the module to have to explain his particular variant of the syntax.
However, if you're interested in a class that *does* provide some of the
above flexibility, I have written such a beast. It's currently used to
parse the Distutils MANIFEST.in file, and I've considered using it for
the mythical Distutils config files. (And it also gets heavy use in my
day job.) It's really a class for reading a file in preparation for
"text processing the Unix way", though: it doesn't say anything about
syntax, it just worries about blank lines, comments, continuations, and
a few other things. Here's the class docstring:
class TextFile:
"""Provides a file-like object that takes care of all the things you
commonly want to do when processing a text file that has some
line-by-line syntax: strip comments (as long as "#" is your comment
character), skip blank lines, join adjacent lines by escaping the
newline (ie. backslash at end of line), strip leading and/or
trailing whitespace, and collapse internal whitespace. All of these
are optional and independently controllable.
Provides a 'warn()' method so you can generate warning messages that
report physical line number, even if the logical line in question
spans multiple physical lines. Also provides 'unreadline()' for
implementing line-at-a-time lookahead.
Constructor is called as:
TextFile (filename=None, file=None, **options)
It bombs (RuntimeError) if both 'filename' and 'file' are None;
'filename' should be a string, and 'file' a file object (or
something that provides 'readline()' and 'close()' methods). It is
recommended that you supply at least 'filename', so that TextFile
can include it in warning messages. If 'file' is not supplied,
TextFile creates its own using the 'open()' builtin.
The options are all boolean, and affect the value returned by
'readline()':
strip_comments [default: true]
strip from "#" to end-of-line, as well as any whitespace
leading up to the "#" -- unless it is escaped by a backslash
lstrip_ws [default: false]
strip leading whitespace from each line before returning it
rstrip_ws [default: true]
strip trailing whitespace (including line terminator!) from
each line before returning it
skip_blanks [default: true}
skip lines that are empty *after* stripping comments and
whitespace. (If both lstrip_ws and rstrip_ws are true,
then some lines may consist of solely whitespace: these will
*not* be skipped, even if 'skip_blanks' is true.)
join_lines [default: false]
if a backslash is the last non-newline character on a line
after stripping comments and whitespace, join the following line
to it to form one "logical line"; if N consecutive lines end
with a backslash, then N+1 physical lines will be joined to
form one logical line.
collapse_ws [default: false]
after stripping comments and whitespace and joining physical
lines into logical lines, all internal whitespace (strings of
whitespace surrounded by non-whitespace characters, and not at
the beginning or end of the logical line) will be collapsed
to a single space.
Note that since 'rstrip_ws' can strip the trailing newline, the
semantics of 'readline()' must differ from those of the builtin file
object's 'readline()' method! In particular, 'readline()' returns
None for end-of-file: an empty string might just be a blank line (or
an all-whitespace line), if 'rstrip_ws' is true but 'skip_blanks' is
not."""
Interested in having something like this in the core? Adding more
options is possible, but the code is already on the hairy side to
support all of these. And I'm not a big fan of the subtle difference in
semantics with file objects, but honestly couldn't think of a better way
at the time.
If you're interested, you can download it from
http://www.mems-exchange.org/exchange/software/python/text_file/
or just use the version in the Distutils CVS tree.
Greg