Parsing problems: A journey from a text file to a directory tree

John Machin sjmachin at lexicon.net
Tue Sep 18 18:31:05 EDT 2007


On Sep 19, 4:51 am, "Michael J. Fromberger"
<Michael.J.Fromber... at Clothing.Dartmouth.EDU> wrote:
> .
> .    # This expression matches "header" lines, defining a new section.
> .    new_re  = re.compile(r'\[([\w ]+)\]\s*$')

Directory names can contain more different characters than those which
match [\w ] ... and which ones depends on the OS; might as well just
allow anything, and leave it to the OS to complain. Also consider
using line.rstrip() (usually a handy precaution on ANY input text
file) instead of having \s*$ at the end of your regex.

> .
> .            while new_level < len(state):
> .                state.pop()

Hmmm ... consider rewriting that as the slightly less obfuscatory

    while len(state) > new_level:
        state.pop()

If you really want to make the reader slow down and think, try this:

    del state[new_level:]

A warning message if there are too many "-" characters might be a good
idea:

[foo]
|-bar
|-zot
|---plugh

> .
> .            state[-1][key] = {}
> .            state.append(state[-1][key])
> .

And if the input line matches neither regex?

> .    return out
>
> To call this, pass a file-like object to parse_folders(), e.g.:
>
> test1 = '''
> [New client].

Won't work with the dot on the end.

> Michael J. Fromberger             | Lecturer, Dept. of Computer Science





More information about the Python-list mailing list