[Tutor] parsing configuration files

Danny Yoo dyoo at hkn.eecs.berkeley.edu
Mon Dec 22 19:20:12 EST 2003



On Mon, 22 Dec 2003, Daniel Ehrenberg wrote:

> I'm trying to write a program to make it easier to
> automate the making of GTK+ GUIs by making
> configuration files that use a simplified YAML-like
> syntax (see yaml.org; it looks kinda like Python).

Sounds good.  Let's take a look at the code:


> def parsefile(filename):
>   stuff2parse = open(filename)
>   data = {}
>   for line in stuff2parse.xreadlines():
>     if line:
>       if not line.startswith(' '):
>         currentSection = line[:-1]
>         data[currentSection] = []
>       else:
>         try:
>           data[currentSection] += line.strip()
>         except: pass
>
>   return data


Side note: it might be nicer to have parsefile() take in a file-like
object, rather than a filename.

###
def parsefile(stuff2parse):
    data = {}
    for line in stuff2parse.xreadlines():
        ...
###


The reason this adjustment might help is because it becomes easier to test
out the code, since we can make a string look like a file with the
StringIO module:

    http://www.python.org/doc/lib/module-StringIO.html

and be able to quickly test it with:

###
from StringIO import StringIO
sample_conffile = '''
gRadio:
    _Gedit
    gedit
    simple text editor

sRadio:
    _Synaptic
    synaptic
    GUI for apt-get
'''

print parsefile(StringIO(sample_conffile))
###

and it might be a good idea to make this a formal unit test.  But I think
I'm getting off the subject.  *grin*


Looking at the main loop:

###
    for line in stuff2parse:
        if line:
            if not line.startswith(' '):
                currentSection = line[:-1]
                data[currentSection] = []
            else:
                try:
                    data[currentSection] += line.strip()
                except: pass
###

There's a subtle type error going on, and it has to do with 'data'.  In
the first case,

    data[currentSection] = []

shows that data[currentSection] must be a list, so we need to deal with it
with list methods.  The second block:

    data[currentSection] += line.strip()

tries to deal with it as if it were a string!  What ends up happening is
something akin to:

###
>>> l = []
>>> l += "hello world"
>>> l
['h', 'e', 'l', 'l', 'o', ' ', 'w', 'o', 'r', 'l', 'd']
###


By the way, this is bizarre!  *grin* I had really expected a TypeError at
this point, like:

###
>>> [] + "foo"
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
TypeError: can only concatenate list (not "str") to list
###

but the difference is due to the behavior of the '+=' operator on lists:
The behavior of '+=' on lists is equivalent to list.extend():

###
>>> l = []
>>> l.extend("hello")
>>> l
['h', 'e', 'l', 'l', 'o']
###

This explains why we got such strange string-to-character-transforming
behavior from '+='.  But this is surprising; I'll have to keep my eye out
for this next time it happens.  *grin*

Anyway, you meant to use the append() method of lists, so instead of:

    data[currentSection] += line.strip()

use

    data[currentSection].append(line.strip())

and that should fix the problem.



On other style note: when you're iterating over a file, you can just say:

   for line in some_file:

Files are iterable in Python, so there's no more need to say:

   for line in some_file.xreadlines()

as long as you're using a relatively recent version of Python.


I hope this helps!




More information about the Tutor mailing list