How to efficiently extract information from structured text file
Paul McGuire
ptmcg at austin.rr.com
Wed Feb 17 14:40:17 EST 2010
On Feb 16, 5:48 pm, Imaginationworks <xiaju... at gmail.com> wrote:
> Hi,
>
> I am trying to read object information from a text file (approx.
> 30,000 lines) with the following format, each line corresponds to a
> line in the text file. Currently, the whole file was read into a
> string list using readlines(), then use for loop to search the "= {"
> and "};" to determine the Object, SubObject,and SubSubObject.
If you open(filename).read() this file into a variable named data, the
following pyparsing parser will pick out your nested brace
expressions:
from pyparsing import *
EQ,LBRACE,RBRACE,SEMI = map(Suppress,"={};")
ident = Word(alphas, alphanums)
contents = Forward()
defn = Group(ident + EQ + Group(LBRACE + contents + RBRACE + SEMI))
contents << ZeroOrMore(defn | ~(LBRACE|RBRACE) + Word(printables))
results = defn.parseString(data)
print results
Prints:
[
['Object1',
['...',
['SubObject1',
['....',
['SubSubObject1',
['...']
]
]
],
['SubObject2',
['....',
['SubSubObject21',
['...']
]
]
],
['SubObjectN',
['....',
['SubSubObjectN',
['...']
]
]
]
]
]
]
-- Paul
More information about the Python-list
mailing list