How to efficiently extract information from structured text file
Imaginationworks
xiajunyi at gmail.com
Wed Feb 17 18:37:02 EST 2010
On Feb 17, 1:40 pm, Paul McGuire <pt... at austin.rr.com> wrote:
> On Feb 16, 5:48 pm, Imaginationworks <xiaju... at gmail.com> wrote:
>
> > Hi,
>
> > I am trying to read object information from a text file (approx.
> > 30,000 lines) with the following format, each line corresponds to a
> > line in the text file. Currently, the whole file was read into a
> > string list using readlines(), then use for loop to search the "= {"
> > and "};" to determine the Object, SubObject,and SubSubObject.
>
> If you open(filename).read() this file into a variable named data, the
> following pyparsing parser will pick out your nested brace
> expressions:
>
> from pyparsing import *
>
> EQ,LBRACE,RBRACE,SEMI = map(Suppress,"={};")
> ident = Word(alphas, alphanums)
> contents = Forward()
> defn = Group(ident + EQ + Group(LBRACE + contents + RBRACE + SEMI))
>
> contents << ZeroOrMore(defn | ~(LBRACE|RBRACE) + Word(printables))
>
> results = defn.parseString(data)
>
> print results
>
> Prints:
>
> [
> ['Object1',
> ['...',
> ['SubObject1',
> ['....',
> ['SubSubObject1',
> ['...']
> ]
> ]
> ],
> ['SubObject2',
> ['....',
> ['SubSubObject21',
> ['...']
> ]
> ]
> ],
> ['SubObjectN',
> ['....',
> ['SubSubObjectN',
> ['...']
> ]
> ]
> ]
> ]
> ]
> ]
>
> -- Paul
Wow, that is great! Thanks
More information about the Python-list
mailing list