Parsing problems: A journey from a text file to a directory tree
John Machin
sjmachin at lexicon.net
Tue Sep 18 18:31:05 EDT 2007
On Sep 19, 4:51 am, "Michael J. Fromberger"
<Michael.J.Fromber... at Clothing.Dartmouth.EDU> wrote:
> .
> . # This expression matches "header" lines, defining a new section.
> . new_re = re.compile(r'\[([\w ]+)\]\s*$')
Directory names can contain more different characters than those which
match [\w ] ... and which ones depends on the OS; might as well just
allow anything, and leave it to the OS to complain. Also consider
using line.rstrip() (usually a handy precaution on ANY input text
file) instead of having \s*$ at the end of your regex.
> .
> . while new_level < len(state):
> . state.pop()
Hmmm ... consider rewriting that as the slightly less obfuscatory
while len(state) > new_level:
state.pop()
If you really want to make the reader slow down and think, try this:
del state[new_level:]
A warning message if there are too many "-" characters might be a good
idea:
[foo]
|-bar
|-zot
|---plugh
> .
> . state[-1][key] = {}
> . state.append(state[-1][key])
> .
And if the input line matches neither regex?
> . return out
>
> To call this, pass a file-like object to parse_folders(), e.g.:
>
> test1 = '''
> [New client].
Won't work with the dot on the end.
> Michael J. Fromberger | Lecturer, Dept. of Computer Science
More information about the Python-list
mailing list